Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot delete job when it has an allocation that was evicted from the node no longer available. #6624

Closed
jacobomarquis opened this issue Nov 6, 2019 · 4 comments · Fixed by #6902
Assignees

Comments

@jacobomarquis
Copy link

Nomad version: 0.10.1
Ubuntu:16.04.6

Issue

Cannot delete job when it has an allocation that was evicted from the node and then node is no longer available.

Reproduction steps

job gets evicted and then the node disappears

output of nomad job status

ID = api
Name = api
Submit Date = 2019-11-06T10:58:53+01:00
Type = service
Priority = 50
Datacenters = northeurope,westeurope
Status = dead (stopped)
Periodic = false
Parameterized = false

Summary
Task Group Queued Starting Running Failed Complete Lost
api 0 0 1 52 121 25

Allocations
ID Node ID Task Group Version Desired Status Created Modified
28e6a071 098775f0 api 31 evict running 12d18h ago 12d18h ago

Node 098775f0 no longer exists and this alloc lingers for ever even after a Garbage Collector.

Job file (if appropriate)

N/A

@jacobomarquis jacobomarquis changed the title Cannot delete job when it has an allocation that was evicted from the node and then node is no longer available. Cannot delete job when it has an allocation that was evicted from the node no longer available. Nov 6, 2019
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Nov 6, 2019
@drewbailey drewbailey self-assigned this Dec 18, 2019
@drewbailey
Copy link
Contributor

Hi @jacobomarquis, I'm looking into this and have a few questions to help reproduce the issue.

Do you mind sharing the job file for this? Are you running an enterprise version of nomad or just open source? The reason I ask is that I believe an allocation status of evict is only used for preemption, and the nomad job status shows type service, and service job preemption is an enterprise only feature.

@drewbailey
Copy link
Contributor

Just wanted to follow up that I was able to reproduce this by preempting a job and immediately terminating the node before the job was able to complete or be evicted.

→ nomad node status
ID        DC   Name     Class    Drain  Eligibility  Status
35df84fd  dc1  client2  class-2  false  eligible     ready
e7abc57c  dc1  client1  class-1  false  eligible     down

→ nomad job status evict                                                                                            [a70e24c]
ID            = evict
Name          = evict
Submit Date   = 2019-12-18T16:23:01-05:00
Type          = service
Priority      = 10
Datacenters   = dc1
Status        = pending
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
api         1       0         1        0       0         0

Placement Failure
Task Group "api":
  * Class "class-2" filtered 1 nodes
  * Constraint "${meta.tag} = foo" filtered 1 nodes

Latest Deployment
ID          = 7be28b39
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
api         1        1       1        0          2019-12-18T16:33:11-05:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
20ef5df9  e7abc57c  api         0        evict    running  5m18s ago  4m41s ago

@jacobomarquis
Copy link
Author

Nice to hear you were able to reproduce the issue. This is exactly what happened in our infrastructure.

@drewbailey drewbailey moved this from Needs Triage to Triaged in Nomad - Community Issues Triage Dec 19, 2019
@drewbailey drewbailey moved this from Triaged to In Review in Nomad - Community Issues Triage Jan 7, 2020
Nomad - Community Issues Triage automation moved this from In Review to Done Jan 7, 2020
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants