Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quotas: evaluate quota feasibility last in scheduler #10753

Merged
merged 2 commits into from
Jun 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
IMPROVEMENTS:
* cli: Added `-monitor` flag to `deployment status` command and automatically monitor deployments from `job run` command. [[GH-10661](https://github.com/hashicorp/nomad/pull/10661)]

BUG FIXES:
* quotas (Enterprise): Fixed a bug where quotas were evaluated before constraints, resulting in quota capacity being used up by filtered nodes. [[GH-10753](https://github.com/hashicorp/nomad/issues/10753)]

## 1.1.1 (June 9, 2021)

FEATURES:
Expand Down Expand Up @@ -114,14 +117,17 @@ BUG FIXES:
* server: Fixed a panic that may arise on submission of jobs containing invalid service checks [[GH-10154](https://github.com/hashicorp/nomad/issues/10154)]
* ui: Fixed the rendering of interstitial components shown after processing a dynamic application sizing recommendation. [[GH-10094](https://github.com/hashicorp/nomad/pull/10094)]

## 1.0.8 (Unreleased)
* quotas (Enterprise): Fixed a bug where quotas were evaluated before constraints, resulting in quota capacity being used up by filtered nodes. [[GH-10753](https://github.com/hashicorp/nomad/issues/10753)]
* quotas (Enterprise): Fixed a bug where stopped allocations for a failed deployment can be double-credited to quota limits, resulting in a quota limit bypass. [[GH-10694](https://github.com/hashicorp/nomad/issues/10694)

## 1.0.7 (June 9, 2021)

BUG FIXES:
* api: Fixed event stream connection initialization when there are no events to send [[GH-10637](https://github.com/hashicorp/nomad/issues/10637)]
* cli: Fixed a bug where `plugin status` did not validate the passed `type` flag correctly [[GH-10712](https://github.com/hashicorp/nomad/pull/10712)]
* cli: Fixed a bug where `alloc exec` may fail with "unexpected EOF" without returning the exit code after a command [[GH-10657](https://github.com/hashicorp/nomad/issues/10657)]
* client: Fixed a bug where `alloc exec` sessions may terminate abruptly after a few minutes [[GH-10710](https://github.com/hashicorp/nomad/issues/10710)]
* quotas (Enterprise): Fixed a bug where stopped allocations for a failed deployment can be double-credited to quota limits, resulting in a quota limit bypass. [[GH-10694](https://github.com/hashicorp/nomad/issues/10694)]
Copy link
Member Author

@tgross tgross Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this because this backport patch didn't land in 1.0.7 after all; we'll call it out in the release notes for 1.1.2/1.0.8

* drivers/exec: Fixed a bug where `exec` and `java` tasks inherit the Nomad agent's `oom_score_adj` value [[GH-10698](https://github.com/hashicorp/nomad/issues/10698)]
* ui: Fixed a bug where exec would not work across regions. [[GH-10539](https://github.com/hashicorp/nomad/issues/10539)]
* ui: Fixed global-search shortcut for non-english keyboards. [[GH-10714](https://github.com/hashicorp/nomad/issues/10714)]
Expand Down
30 changes: 18 additions & 12 deletions scheduler/stack.go
Original file line number Diff line number Diff line change
Expand Up @@ -208,10 +208,6 @@ func NewSystemStack(ctx Context) *SystemStack {
// have to evaluate on all nodes.
s.source = NewStaticIterator(ctx, nil)

// Create the quota iterator to determine if placements would result in the
// quota attached to the namespace of the job to go over.
s.quota = NewQuotaIterator(ctx, s.source)

// Attach the job constraints. The job is filled in later.
s.jobConstraint = NewConstraintChecker(ctx, nil)

Expand Down Expand Up @@ -243,13 +239,20 @@ func NewSystemStack(ctx Context) *SystemStack {
s.taskGroupDevices,
s.taskGroupNetwork}
avail := []FeasibilityChecker{s.taskGroupCSIVolumes}
s.wrappedChecks = NewFeasibilityWrapper(ctx, s.quota, jobs, tgs, avail)
s.wrappedChecks = NewFeasibilityWrapper(ctx, s.source, jobs, tgs, avail)

// Filter on distinct property constraints.
s.distinctPropertyConstraint = NewDistinctPropertyIterator(ctx, s.wrappedChecks)

// Create the quota iterator to determine if placements would result in
// the quota attached to the namespace of the job to go over.
// Note: the quota iterator must be the last feasibility iterator before
// we upgrade to ranking, or our quota usage will include ineligible
// nodes!
s.quota = NewQuotaIterator(ctx, s.distinctPropertyConstraint)

// Upgrade from feasible to rank iterator
rankSource := NewFeasibleRankIterator(ctx, s.distinctPropertyConstraint)
rankSource := NewFeasibleRankIterator(ctx, s.quota)

// Apply the bin packing, this depends on the resources needed
// by a particular task group. Enable eviction as system jobs are high
Expand Down Expand Up @@ -330,10 +333,6 @@ func NewGenericStack(batch bool, ctx Context) *GenericStack {
// balancing across eligible nodes.
s.source = NewRandomIterator(ctx, nil)

// Create the quota iterator to determine if placements would result in the
// quota attached to the namespace of the job to go over.
s.quota = NewQuotaIterator(ctx, s.source)

// Attach the job constraints. The job is filled in later.
s.jobConstraint = NewConstraintChecker(ctx, nil)

Expand Down Expand Up @@ -366,16 +365,23 @@ func NewGenericStack(batch bool, ctx Context) *GenericStack {
s.taskGroupDevices,
s.taskGroupNetwork}
avail := []FeasibilityChecker{s.taskGroupCSIVolumes}
s.wrappedChecks = NewFeasibilityWrapper(ctx, s.quota, jobs, tgs, avail)
s.wrappedChecks = NewFeasibilityWrapper(ctx, s.source, jobs, tgs, avail)

// Filter on distinct host constraints.
s.distinctHostsConstraint = NewDistinctHostsIterator(ctx, s.wrappedChecks)

// Filter on distinct property constraints.
s.distinctPropertyConstraint = NewDistinctPropertyIterator(ctx, s.distinctHostsConstraint)

// Create the quota iterator to determine if placements would result in
// the quota attached to the namespace of the job to go over.
// Note: the quota iterator must be the last feasibility iterator before
// we upgrade to ranking, or our quota usage will include ineligible
// nodes!
s.quota = NewQuotaIterator(ctx, s.distinctPropertyConstraint)

// Upgrade from feasible to rank iterator
rankSource := NewFeasibleRankIterator(ctx, s.distinctPropertyConstraint)
rankSource := NewFeasibleRankIterator(ctx, s.quota)

// Apply the bin packing, this depends on the resources needed
// by a particular task group.
Expand Down