Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system scheduler: error when performing plan/run with disabled nodes #5169

Closed
jrasell opened this issue Jan 9, 2019 · 6 comments · Fixed by #6968
Closed

system scheduler: error when performing plan/run with disabled nodes #5169

jrasell opened this issue Jan 9, 2019 · 6 comments · Fixed by #6968

Comments

@jrasell
Copy link
Member

jrasell commented Jan 9, 2019

Nomad version

Output from Nomad v0.8.6 (ab54ebcfcde062e9482558b7c052702d4cb8aa1b+CHANGES)

Operating system and Environment details

OS: ubuntu-18.04

Issue

When attempting to plan/run a Nomad system job (which includes constraints) on a node pool which includes administratively disabled nodes, the call will fail with the following error:

Unexpected response code: 500 (rpc error: could not find node \"f5dd4bad-ddec-796d-6ab5-d11d80324ad5\")

The plan/run will succeed once the node is enabled in the cluster.

Reproduction steps

Disable a node client node and then attempt to deploy a system job, potentially with constraints as I am not sure if this is a key part of this issue.

Job file (if appropriate)

job "example" {
  datacenters = ["dc1"]
  type        = "system"

  group "example" {
    constraint {
      operator  = "="
      attribute = "${meta.role}"
      value     = "specificrole"
    }
...
@dadgar
Copy link
Contributor

dadgar commented Jan 9, 2019

@jrasell what do you mean disable? Drain the node or mark it as in-eligible?

@jrasell
Copy link
Member Author

jrasell commented Jan 10, 2019

@dadgar when the node has been marked as in-eligible.

@jorgemarey
Copy link
Contributor

HI!,

This happend to me again on nomad 0.9.5

I was able to reproduce it by doint the following:

nomad agent -dev

nomad run example.nomad // change job to be system
nomad node eligibility -disable <node-id>
vi example.nomad //change the container version for example)
nomad plan example.nomad
Error during plan: Unexpected response code: 500 (could not find node "<node-id>")

I'm not able to perform an update to a system job if the cluster has one single node in ineligible status.

@dpn
Copy link

dpn commented Sep 26, 2019

Yep, still seeing this as well on 0.9.5.

Seems to work once the offending node goes offline, but if we do a drain -ignore-system and try to deploy while the node is ineligible and the old system job is still running it will fail. So maybe this is working as intended as Nomad can't update that one instance.

@codyja
Copy link

codyja commented Jan 3, 2020

We're seeing this on 0.9.5 too. Can this be reopened?

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants