Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain v2: add controlled draining #4010

Merged
merged 79 commits into from
Mar 22, 2018
Merged

Drain v2: add controlled draining #4010

merged 79 commits into from
Mar 22, 2018

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Mar 20, 2018

Fixes #2736

Features in this PR:

  • Drain v2 - adds a migrate stanza to jobs similar to update to control the rate at which jobs are drained from draining nodes.
  • testlog improvement: set NOMAD_TEST_STDOUT=1 to output test log output to stdout instead of t.Logf (handy for iterating on a slow test over and over)
  • mock.BatchJob() - does what it says on the tin. Easier than editing mock.Job()

Future PRs will extend drain testing and add documentation.

@schmichael schmichael changed the title [WIP] Alt rebased of drainv2 Drain v2: add controlled draining Mar 20, 2018
api/nodes.go Outdated
// will disable draining.
DrainSpec *DrainSpec

// MarkEligible marks the node as eligible if removing the drain strategy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// MarkEligible marks the node as eligible for scheduling if removing the drain strategy.

type DrainSpec struct {
// Deadline is the duration after StartTime when the remaining
// allocations on a draining Node should be told to stop.
Deadline time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its possible I missed it because of the large feature branch, but if not where is the validation logic for this? e,g make sure that deadlne is >=0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the validation logic for this? e,g make sure that deadlne is >=0

In the UpdateDrain RPC endpoint itself. A Deadline <= 0 is a force drain: https://github.com/hashicorp/nomad/pull/4010/files#diff-57d6a6426f4ff4ab5d9eecfbca4399daR446

Its possible I missed it because of the large feature branch

I feel like it would be impossible for you to catch it in such a huge PR so no worries.

if alloc.Job.Type == structs.JobTypeBatch && alloc.RanSuccessfully() {
untainted[alloc.ID] = alloc
// Non-terminal allocs that should migrate should always migrate
if alloc.DesiredTransition.ShouldMigrate() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also check the alloc currently is in state DesiredStatusRun, because what if the same alloc is revisited again by the reconciler and it has already been migrated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloc.TerminalStatus() above will be true if DesiredStatus != "run" so the alloc will not be migrated.

HealthyDeadline *time.Duration `mapstructure:"healthy_deadline"`
}

func DefaultMigrateStrategy() *MigrateStrategy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer this being defined in structs like the rest of them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we prefer calling the structs version from here or hardcoding defaults in Canonicalize? I see a variety of approaches in this file.


// Neither deployments nor migrations care about system jobs so never
// watch their health
if alloc.Job.Type == structs.JobTypeSystem {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be != Service?

Can't have migration on batch or system and update isn't defined on batch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I didn't realize batch didn't support update. Our docs don't mention that: https://www.nomadproject.io/docs/job-specification/update.html

@schmichael schmichael deleted the f-drain-rebased2 branch August 28, 2018 17:18
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants