Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update eval modify index as part of plan apply. #3669

Merged
merged 5 commits into from
Dec 19, 2017

Conversation

preetapan
Copy link
Member

No description provided.

// EvalID is the eval ID of the plan being applied. We also update the modify
// index of the eval ID as part of applying plan results. This is to ensure that
// other workers that are dequeing evaluations don't miss updates that can affect
// scheduling decisions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This is to ensure..." -> The modify index of the evaluation is updated as part of applying the plan to ensure that subsequent scheduling events for the same job will wait for the index that last produced state changes. This is necessary for blocked evaluations since they can be processed many times, potentially making state updates, without the state of the evaluation itself being updated.

@@ -286,6 +286,73 @@ func TestEvalEndpoint_Dequeue_WaitIndex(t *testing.T) {
}
}

func TestEvalEndpoint_Dequeue_UpdateWaitIndex(t *testing.T) {
// test enqueing an eval, updating a plan result for the same eval and dequeing the eval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dequeuing

}
if existing == nil {
// return if there isn't an eval with this ID.
// In some cases (like snapshot restores), we process evals that are not already in the state store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot restores don't go through the normal upsert codepath. They have there own, direct insertion to avoid these problems.

if existing == nil {
// return if there isn't an eval with this ID.
// In some cases (like snapshot restores), we process evals that are not already in the state store.
s.logger.Printf("[WARN] state_store: unable to find eval ID %v, cannot update modify index ", evalID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do %q instead and there is an extra space on the end.

}
eval := existing.(*structs.Evaluation).Copy()
// Update the indexes
eval.CreateIndex = existing.(*structs.Evaluation).CreateIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the copy capture this one?

evalOut, err := state.EvalByID(ws, eval.ID)
assert.Nil(err)
assert.NotNil(evalOut)
assert.Equal(uint64(1000), evalOut.ModifyIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.EqualValues(1000, evalOut.ModifyIndex)

return fmt.Errorf("eval lookup failed: %v", err)
}
if existing == nil {
return fmt.Errorf("[ERR] state_store: unable to find eval id %q", evalID)
Copy link
Contributor

@dadgar dadgar Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err := fmt.Errorf("unable to find eval id %q", evalID)
s.logger.Printf("[ERR] state_store: %v", err)
return err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, do you mind if the second line was just s.logger.Printf(err) so it doesn't double print [ERR] state_store?

@dadgar dadgar merged commit 259ee6b into master Dec 19, 2017
@dadgar dadgar deleted the b-planapply-eval-modindex branch December 19, 2017 00:10
@jippi
Copy link
Contributor

jippi commented Dec 19, 2017

What does this actually fix in symptoms and problems? Would be nice with some more context of the root cause and fix for it

@preetapan
Copy link
Member Author

@jippi It fixes a rare race that shows up under high load and limited resources. This manifested as subtle confusing bugs like duplicate deployments created, or reusing the same alloc index for a different allocation.

Bug: During high load and limited resources, if the scheduler made some placements but was blocked on creating more, and then got unblocked, it could miss the placements it already made.

Root cause: Scheduler workers processing an evaluation and not seeing state updates from previous plans that were applied associated with the same evaluation.

Fix : Update the eval's last modify index when plans are applied, so that scheduler workers can wait for the raft index to catch up before starting to process the eval.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants