Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After removing aws-ebs0 from the cluster deleting nodes fails #8121

Closed
analytically opened this issue Jun 5, 2020 · 7 comments · Fixed by #8619
Closed

After removing aws-ebs0 from the cluster deleting nodes fails #8121

analytically opened this issue Jun 5, 2020 · 7 comments · Fixed by #8619

Comments

@analytically
Copy link

Nomad version

Nomad v0.11.2 (807cfeb)

Operating system and Environment details

Amazon Linux 2, c5.4xlarge

Issue

After removing the aws-ebs0 plugin from our cluster, we're seeing the client list with ineligble nodes grow unbounded without removing nodes.

The logs have:
nomad.fsm: DeleteNode failed: error="csi plugin delete failed: csi_plugins missing pl ugin aws-ebs0"

@analytically
Copy link
Author

Similar to #8100

@tgross
Copy link
Member

tgross commented Jun 8, 2020

Hi @analytically can you clarify for me the symptom you're seeing here:

the client list with ineligble nodes grow unbounded without removing nodes.

Nomad nodes are being marked as ineligible for placing workloads? Or do you mean you can't schedule CSI workloads on those nodes?

Similar to #8100

That issue is about volume claim GC, which should be unrelated.

@analytically
Copy link
Author

No this is also the volume claim GC (Nomad Nodes GC)

@tgross
Copy link
Member

tgross commented Jun 22, 2020

There's a difference between "volume claim GC" (which is the cleanup of claims that an allocation has on a volume) and a "node GC" (which is the cleanup of Nomad clients).

I'm still not clear what the symptom you're seeing it: Nomad nodes are being marked as ineligible for placing workloads? Or do you mean you can't schedule CSI workloads on those nodes?

@tgross tgross removed their assignment Jul 28, 2020
@tgross
Copy link
Member

tgross commented Jul 29, 2020

Some follow-up data which isn't quite the same thing but could be related. If we try to delete a job that hasn't registered its plugin, that results in the following error state:

2020-07-29T14:54:24.417Z [ERROR] nomad.fsm: DeleteJob failed: error="deleting job from plugin: plugin missing: aws-ebs0 <nil>"
2020-07-29T14:54:24.417Z [ERROR] nomad.fsm: deregistering job failed: error="deleting job from plugin: plugin missing: aws-ebs0 <nil>"

@tgross
Copy link
Member

tgross commented Aug 10, 2020

Should be fixed with #8619

@github-actions
Copy link

github-actions bot commented Nov 3, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants