Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic eval insertion with job (de-)registration #8435

Merged
merged 5 commits into from
Jul 15, 2020
Merged

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Jul 14, 2020

This fixes a bug where jobs may get "stuck" unprocessed that
disproportionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.

This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.

Also, applies the same change to ensure atomicity in job deregistration.

Backward Compatibility

We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.

To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.

When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.

I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discrepancy doesn't
seem to be material - it's the eval.JobModifyIndex that matters.

Fixes #8219

This fixes a bug where jobs may get "stuck" unprocessed that
dispropotionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.

This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.

Also, applies the same change to ensure atomicity in job deregistration.

Backward Compatibility

We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.

To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.

When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.

I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discripency doesn't
seem to be material - it's the eval.JobModifyIndex that matters.
@notnoop notnoop self-assigned this Jul 14, 2020
Comment on lines +586 to +592
// always attempt upsert eval even if job deregister fail
if req.Eval != nil {
req.Eval.JobModifyIndex = index
if err := n.upsertEvals(index, []*structs.Evaluation{req.Eval}); err != nil {
return err
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here we attempt to insert the eval even if the job deregistration fails. I find this behavior very odd but it's explicitly tested in

func TestJobEndpoint_Deregister_Nonexistent(t *testing.T) {
- with it insuring that the resulting eval is populated and have JobModifyIndex set ?! I kept the behavior the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this was intended to protect against the previous non-atomic behavior? Ex. if you deregister but the eval wasn't persisted, but then tried to deregister again, the job would be potentially gone but you'd still want an eval to clean it up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems like a bug to me, but I think one that does not impact correctness (only waste some work doing nothing).

From peeking around generic_sched.go and reconcile.go, I'm not seeing us check if the evaluation is a Deregister. We always seem to checked if Job.Stopped is true! This means if we fail to update the statestore with the stopped Job, the Deregister evaluation will be the same as any other evaluation and end up being a noop for the job as it is not stopped and presumably already scheduled/allocated.

I suspect that @tgross is correct and that the test is merely asserting the non-atomic behavior existed: #981

Since we're making Job+Eval submissions atomic, I think we should at least trying to make this section atomic as well. I can't figure out where a Deregister eval for a non-Stopped job would have a desirable affect, but perhaps somebody else can find a case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to let the current behavior stand as-is, as stopping making an eval is a user visible change. Will follow up in another PR and we can discuss this issue further there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to let the current behavior stand as-is, as stopping making an eval is a user visible change.

I don't find this reason sufficient to keep it since the eval in question is a dereg but the effect would be a noop. If that's the behavior I feel like we're emitting a useless and actively misleading eval which should be treated like a bug and removed. There's no benefit to leaving it in place as anyone observing it would only be confused by its lack of affect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should be changed - but I believe such a user-visible behavior (albeit a small one) change is outside the scope of this PR, so I will follow up in another PR.

nomad/job_endpoint.go Outdated Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I've left a few suggestions but overall this looks really good.

}
// COMPAT(1.1.0): Remove the ServerMeetMinimumVersion check.
// 0.12.1 introduced atomic eval job registration
if args.Eval != nil &&
Copy link
Member

@tgross tgross Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this args.Eval != nil is always true; we're returning early from checking args.Eval == nil right above it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, do we need to remove the returning early above for multi region deployment? Returning early means that periodic/dispatch jobs will not be handled by multiregionStart? This PR doesn't change the behavior, but just noticed the MRD call.

Copy link
Member

@tgross tgross Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Periodic and dispatch jobs get kicked off with their normal dispatch mechanisms in MRD. We special case them in schedule/reconcile.go, rather than running them through the MRD in deploymentwatcher. (There might be future work there but that's a later phase of work.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I'm tempted to remove the early return and somehow make it more multiregionStart that doesn't apply to paramterized/periodic jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea. There's a big blog of return nil at the top of the ENT functionality where we return early when we don't need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'll follow up in another PR - I'm a bit confused about the interaction and would love to do more testing.

nomad/job_endpoint.go Outdated Show resolved Hide resolved
nomad/job_endpoint.go Outdated Show resolved Hide resolved
nomad/job_endpoint_test.go Show resolved Hide resolved
nomad/state/state_store.go Outdated Show resolved Hide resolved
Comment on lines +586 to +592
// always attempt upsert eval even if job deregister fail
if req.Eval != nil {
req.Eval.JobModifyIndex = index
if err := n.upsertEvals(index, []*structs.Evaluation{req.Eval}); err != nil {
return err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this was intended to protect against the previous non-atomic behavior? Ex. if you deregister but the eval wasn't persisted, but then tried to deregister again, the job would be potentially gone but you'd still want an eval to clean it up?

nomad/fsm.go Show resolved Hide resolved
Comment on lines +586 to +592
// always attempt upsert eval even if job deregister fail
if req.Eval != nil {
req.Eval.JobModifyIndex = index
if err := n.upsertEvals(index, []*structs.Evaluation{req.Eval}); err != nil {
return err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems like a bug to me, but I think one that does not impact correctness (only waste some work doing nothing).

From peeking around generic_sched.go and reconcile.go, I'm not seeing us check if the evaluation is a Deregister. We always seem to checked if Job.Stopped is true! This means if we fail to update the statestore with the stopped Job, the Deregister evaluation will be the same as any other evaluation and end up being a noop for the job as it is not stopped and presumably already scheduled/allocated.

I suspect that @tgross is correct and that the test is merely asserting the non-atomic behavior existed: #981

Since we're making Job+Eval submissions atomic, I think we should at least trying to make this section atomic as well. I can't figure out where a Deregister eval for a non-Stopped job would have a desirable affect, but perhaps somebody else can find a case?

nomad/fsm.go Outdated Show resolved Hide resolved
nomad/job_endpoint.go Outdated Show resolved Hide resolved
nomad/job_endpoint.go Outdated Show resolved Hide resolved
nomad/periodic.go Show resolved Hide resolved
nomad/job_endpoint.go Outdated Show resolved Hide resolved
Mahmood Ali added 4 commits July 15, 2020 08:49
We set the Eval field on job (de-)registration only after all servers
get upgraded, to avoid dealing with duplicate evals.
@notnoop notnoop merged commit 24a7506 into master Jul 15, 2020
@notnoop notnoop deleted the b-atomic-job-register branch July 15, 2020 17:48
drewbailey added a commit that referenced this pull request Jan 8, 2021
#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.
drewbailey added a commit that referenced this pull request Jan 11, 2021
#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.
drewbailey added a commit that referenced this pull request Jan 11, 2021
#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.
drewbailey added a commit that referenced this pull request Jan 15, 2021
#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.
drewbailey added a commit that referenced this pull request Jan 19, 2021
#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.
drewbailey added a commit that referenced this pull request Jan 22, 2021
#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.
drewbailey added a commit that referenced this pull request Jan 22, 2021
* Prevent Job Statuses from being calculated twice

#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
backspace pushed a commit that referenced this pull request Jan 22, 2021
* Prevent Job Statuses from being calculated twice

#8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Job + Eval submitting atomic
3 participants