Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System job with constrains fails to plan #12748

Open
chilloutman opened this issue Apr 22, 2022 · 16 comments
Open

System job with constrains fails to plan #12748

chilloutman opened this issue Apr 22, 2022 · 16 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/scheduling theme/system-scheduler type/bug

Comments

@chilloutman
Copy link

Nomad version

v1.2.6

(Nomad v1.2.6 has problem described below, while Nomad v1.1.5 works as expected.)

Operating system and Environment details

Nomad nodes are running Ubuntu. Docker driver is used for all tasks.

A set of nodes has node.class set to worker and there are few other nodes in the cluster.

Issue

System job with constrains fails to plan.

Reproduction steps

A job with type = "system" is used to schedule tasks on the worker nodes. So the following constraint is added to the worker group:

constraint {
  attribute = "${node.class}"
  operator  = "="
  value     = "worker"
}

Expected Result

All the worker nodes should run the worker task, all other nodes should not.

Actual Result

This works sometimes, in particular when there are no allocations on the cluster. But running nomad job plan after allocations are running displays the following warning:

Scheduler dry-run:
- WARNING: Failed to place allocations on all nodes.
  Task Group "worker" (failed to place 1 allocation):
    * Class "entry": 1 nodes excluded by filter
    * Constraint "${node.class} = worker": 1 nodes excluded by filter

This should not be a warning, as the allocations match the job definition, considering the constraints.
nomad job run produces the desired state and the job state is displayed as “not scheduled” on all non-worker nodes.

Removing the constrains shows no warning, but obviously schedules the worker task on non-worker nodes, which is unwanted.

The only workaround seems be to ignore warnings, which defeats the purpose of nomad job plan, or create a entire separate cluster for the workers.

Possibly related:

@jrasell jrasell added this to Needs Triage in Nomad - Community Issues Triage via automation Apr 22, 2022
@cr0c0dylus
Copy link

cr0c0dylus commented Apr 25, 2022

I'm facing the same problem (1.2.6):

Job: "stage-cron"
Task Group: "cron" (1 ignore)
Task: "cron"

Scheduler dry-run:

  • WARNING: Failed to place allocations on all nodes.
    Task Group "cron" (failed to place 1 allocation):
    • Constraint "${meta.env} = stage": 5 nodes excluded by filter

But if I stop job before submitting a new job, it works as expected:

$ nomad job stop stage-cron
==> 2022-04-25T18:45:07+03:00: Monitoring evaluation "86e8c675"
2022-04-25T18:45:07+03:00: Evaluation triggered by job "stage-cron"
==> 2022-04-25T18:45:08+03:00: Monitoring evaluation "86e8c675"
2022-04-25T18:45:08+03:00: Evaluation status changed: "pending" -> "complete"
==> 2022-04-25T18:45:08+03:00: Evaluation "86e8c675" finished with status "complete"

$ nomad job plan ...

+/- Job: "stage-cron"
+/- Stop: "true" => "false"
Task Group: "cron" (1 create)
Task: "cron"

Scheduler dry-run:

  • All tasks successfully allocated.

@cr0c0dylus
Copy link

I have found a temporary workaround. You need to add 1.1.x server to the cluster and stop-start 1.2.6 leaders until 1.1.x becomes a leader.

@tgross
Copy link
Member

tgross commented May 2, 2022

Hi @chilloutman! This definitely seems like it could be related to #12016. I'm not going to mark it as a duplicate just in case it's not but I'll cross-reference here so that whomever tackles that issue will see this as well. I don't have a good workaround for you other than to ignore warnings (they're warnings and not errors), but I realize that isn't ideal.

Just FYI @cr0c0dylus:

I have found a temporary workaround. You need to add 1.1.x server to the cluster and stop-start 1.2.6 leaders until 1.1.x becomes a leader.

This is effectively downgrading Nomad into a mixed-version cluster, which is not supported and highly likely to result in state store corruption. Doing so in order to suppress something that's only a warning is not advised.

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label May 2, 2022
@tgross tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage May 2, 2022
@cr0c0dylus
Copy link

Doing so in order to suppress something that's only a warning is not advised.

Unfortunately, it is not only a warning. It cannot allocate a job at all. Another trick - to change one of the limits in resources stanza. For example, to add +1 to the CPU limit. But it doesn't work with some of my jobs.

@ygersie
Copy link
Contributor

ygersie commented May 31, 2022

I wonder if this is related #11778 (comment) It really looks like some bug in the scheduler that incorrectly fails placement during the node feasibility check. It is almost like it's not iterating through all nodes but for some reason returns a placement failure while it hasn't exhausted the full list yet.

@lssilva
Copy link

lssilva commented Jun 7, 2022

I am also facing this issue and I had to downgrade nomad.

@chilloutman
Copy link
Author

I'm wondering if this could be the cause: https://github.com/hashicorp/nomad/pull/11111/files#diff-c4e3135b7aa83ba07d59d003a8ab006915207425b8728c4cf070eee20ab9157a

"// track node filtering, to only report an error if all nodes have been filtered" might not be working as intended. Or maybe instead of only warnings #11111 ended up causing errors.

@jmwilkinson
Copy link
Contributor

Verified we hit this with constraints on 1.2.6 as well.

Mitigation was reverting this to 1.1.5.

I do not know how bugs are prioritized but this should probably be pretty high.

@cr0c0dylus
Copy link

BTW, it would be great if I those warnings can be completely disabled in config. If I have 50 nodes in cluster and make constraint for 3 nodes - what the sense to see "47 Not Scheduled"? System jobs are very useful for scaling in HA configuration - I don't need to modify job stanza, just add or remove nodes with a special meta variable.

@dext0r
Copy link

dext0r commented Jun 30, 2022

I'm wondering if this could be the cause: https://github.com/hashicorp/nomad/pull/11111/files#diff-c4e3135b7aa83ba07d59d003a8ab006915207425b8728c4cf070eee20ab9157a

"// track node filtering, to only report an error if all nodes have been filtered" might not be working as intended. Or maybe instead of only warnings #11111 ended up causing errors.

It's the cause indeed. Reverting this pull request fixed the issue for me on 1.3.1.

@cr0c0dylus
Copy link

Nomad v1.2.9 (86192e4)

The problem persists. I still need to stop the 1.2.9 masters in sequence until 1.0.18 becomes the leader and allows deployment.

@jmwilkinson
Copy link
Contributor

There may be a fix in 1.3.2, at least it looks that way: https://github.com/hashicorp/nomad/blob/v1.3.2/scheduler/scheduler_system.go#L298

@seanamos
Copy link

Issue still exists in v1.5.3, frequently run into this when upgrading system jobs.

While the nomad CLI reports this error, the rollout will still actually happen in Nomad.

@nCrazed
Copy link

nCrazed commented Dec 20, 2023

I am seeing the same behavior as @seanamos in v1.6.3

@cr0c0dylus
Copy link

The problem continues to occur in v1.7.3

@elgatopanzon
Copy link

Can confirm still present in Nomad v1.7.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/scheduling theme/system-scheduler type/bug
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

10 participants