Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Logs report "failed to acquire semaphore" during deletion #7818

Open
artem-nefedov opened this issue Jun 4, 2024 · 6 comments
Open

Comments

@artem-nefedov
Copy link

artem-nefedov commented Jun 4, 2024

What were you trying to accomplish?

Delete the cluster (it seem to work fine).

What happened?

Logs report this message during deletion:

[ℹ]  deleting EKS cluster "redacted"
[ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "redacted"
[ℹ]  starting parallel draining, max in-flight of 1
[✖]  failed to acquire semaphore while waiting for all routines to finish: %!w(*errors.errorString=&{context canceled})

Deletion still finished without errors, so it does not look like this affect anything. But the log does look like there's a problem.

The behavior is reproduced on all attempts.

How to reproduce it?

Create cluster with 1 managed nodegroup and no unmanaged nodegroups, then delete it.

Versions

eksctl version: 0.180.0
EKS version 1.30

The message was not present on eksctl version 0.176.0 with EKS version 1.29 (there are no changes in cluster config besides EKS version).

@cPu1
Copy link
Collaborator

cPu1 commented Jun 6, 2024

@artem-nefedov, this is a bug in the logging and concurrency handling but it should not affect normal operation of the command. That part of the codebase is a bit dated and could use some refactoring. We'll look into this soon.

@lgb861213
Copy link

We also encountered the same error log information when deleting EKS version 1.30, and our eksctl version is 0.183.
the error message that is following:
2024-07-06 16:56:08 [ℹ] deleting EKS cluster "test"
2024-07-06 16:56:11 [ℹ] will drain 0 unmanaged nodegroup(s) in cluster "aloda-test"
2024-07-06 16:56:11 [ℹ] starting parallel draining, max in-flight of 1
2024-07-06 16:56:11 [✖] failed to acquire semaphore while waiting for all routines to finish: %!w(*errors.errorString=&{context canceled})
2024-07-06 16:56:14 [ℹ] deleted 0 Fargate profile(s)
2024-07-06 16:56:16 [✔] kubeconfig has been updated
2024-07-06 16:56:16 [ℹ] cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
2024-07-06 16:56:23 [ℹ]

@AmitBenAmi
Copy link
Contributor

Seeing the same issue with version 0.185.0

@acarey-haus
Copy link

Seeing this issue with eksctl version 0.187.0 when deleting a nodegroup. The deletion succeeded.

% eksctl delete nodegroup --cluster redacted --name redacted-nodegroup
2024-07-18 13:31:37 [ℹ]  1 nodegroup (redacted-nodegroup) was included (based on the include/exclude rules)
2024-07-18 13:31:37 [ℹ]  will drain 1 nodegroup(s) in cluster "redacted"
2024-07-18 13:31:37 [ℹ]  starting parallel draining, max in-flight of 1
2024-07-18 13:31:37 [!]  no nodes found in nodegroup "redacted-nodegroup" (label selector: "alpha.eksctl.io/nodegroup-name=redacted-nodegroup")
2024-07-18 13:31:37 [✖]  failed to acquire semaphore while waiting for all routines to finish: context canceled
2024-07-18 13:31:37 [ℹ]  will delete 1 nodegroups from cluster "redacted"
2024-07-18 13:31:40 [ℹ]  1 task: { 1 task: { delete nodegroup "redacted-nodegroup" [async] } }
2024-07-18 13:31:40 [ℹ]  will delete stack "eksctl-redacted-nodegroup-redacted-nodegrou"p
2024-07-18 13:31:40 [✔]  deleted 1 nodegroup(s) from cluster "redacted"

@fnzwex
Copy link

fnzwex commented Aug 6, 2024

Test results after finding this and in an attempt to help:

0.176.0 - good until 1.30 - 1.30 is not supported and it refuses to work
0.177.0 - this issue
0.178.0 - this issue
0.179.0 - this issue
0.180.0 through 0.186.0 - untested by me but presumed bad since surrounded by bad
0.187.0 - this issue
0.188.0 - STILL this issue - 5 days old.

It'd be great if this could be addressed ASAP and released as 0.189.0 soon. Any chance of that?

Pretty bad that it got broken in the first place and even worse that it got left broken for such a long time.

Still a better way to manage clusters than Terraform/OpenTofu IMO (when it works properly) (which NO version does for 1.30)

@jarvisbot01
Copy link

eksctl version 0.190.0
eks 1.30

2024-09-21 14:55:45 [✖] failed to acquire semaphore while waiting for all routines to finish: context canceled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants