Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to deploy efs-csi-controller to Fargate to support Karpenter-provisioned EKS cluster #1100

Open
Nuru opened this issue Aug 16, 2023 · 16 comments · Fixed by #1195
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Nuru
Copy link

Nuru commented Aug 16, 2023

/kind bug

What happened?

  • I am using Terraform to manage AWS resources.
  • I tried to deploy, via Terraform, an EKS cluster with no nodes, but with the EFS CSI Add-On (and others). Nodes to be provisioned by Karpenter. The Karpenter controller itself is deployed to Fargate.
    • Karpenter provisions EC2 nodes on demand to run Kubernetes Pods.
    • I want the Pods (on EC2, provisioned by Karpenter) to have access to EFS.
    • Terraform fails to deploy the EKS cluster because the EFS Add-On never becomes ready (reports status as "Degraded"). I believe this is similar to EBS CSI ISSUE #1801: the controller pods need to be running for the Add-On to report being healthy, but they have no place to run.
  • I added a Fargate profile, targeting label app = "efs-csi-controller", so that the EFS controller would be launched to Fargate.
  • The Add-On still would not become healthy because the communication sockets were not created/available, and still reports status as "Degraded".
  • After Karpenter was deployed, it started nodes, and the efs-csi-node Daemonset successfully deployed to the EC2 nodes, but the efs-csi-controller Pods were still in a CrashLoopBackoff and the Add-On still reports status as "Degraded"..

What you expected to happen?

The controller pods would be deployed to Fargate and and work without the Node component, and the Add-On would report status as "Active". As EC2 Nodes were provisioned, controller Pods would work from Fargate while Node Pods worked properly on EC2 Nodes.

How to reproduce it (as minimally and precisely as possible)?

See "What happened" above.

Anything else we need to know?:

The failure that is reported to Kubernetes comes from the efs-plugin container exiting with an error. IMHO it should not try to run on Fargate, and probably should not be deployed as part of the controller for this reason.

Environment

  • Kubernetes version (use kubectl version): v1.27.4-eks-2d98532
  • Driver version: v1.5.8-eksbuild.1

Please also attach debug logs to help us better diagnose

Log excerpts (each one just keeps repeating the quoted excerpt):

efs-csi-controller csi-provisioner

W0816 04:26:59.779601       1 connection.go:183] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock

efs-csi-controller liveness-probe

W0816 04:27:00.989300       1 connection.go:173] Still connecting to unix:///csi/csi.sock

efs-csi-controller efs-plugin

I0816 05:54:46.413768       1 config_dir.go:63] Mounted directories do not exist, creating directory at '/etc/amazon/efs'
I0816 05:54:46.418766       1 metadata.go:63] getting MetadataService...
I0816 05:54:52.757469       1 metadata.go:71] retrieving metadata from Kubernetes API
F0816 05:54:52.773395       1 driver.go:56] could not get metadata: did not find aws instance ID in node providerID string
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 16, 2023
@apenney
Copy link

apenney commented Sep 22, 2023

I also have the same issue. I would like to run the controllers on fargate, and have them attach EFS volumes to actual nodes that are then provisoned by karpenter.

@z0rc
Copy link

z0rc commented Feb 26, 2024

#1195 isn't sufficient for Fargate support. Latest eks addon v1.7.6-eksbuild.1 sets securityContext.privileged: true for controller pods. This isn't supported by fargate nodes.

Please reopen.

@z0rc
Copy link

z0rc commented Feb 26, 2024

/reopen

@k8s-ci-robot
Copy link
Contributor

@z0rc: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@z0rc
Copy link

z0rc commented Feb 26, 2024

@Nuru could you reopen the ticket please?

@Nuru
Copy link
Author

Nuru commented Feb 27, 2024

/reopen

It looks like the changes in #1195 were necessary, but not sufficient.

@k8s-ci-robot k8s-ci-robot reopened this Feb 27, 2024
@k8s-ci-robot
Copy link
Contributor

@Nuru: Reopened this issue.

In response to this:

/reopen

It looks like the changes in #1195 were necessary, but not sufficient.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sogos
Copy link

sogos commented Apr 25, 2024

Just fall in the same situation, can't deploy the add-on because kube-system is a fargate namespace.
Same context = Karpenter + FargateCluster
Will switch on the manual installation mode, but that seem a waste of time.
Allow controllers to run on fargate would be great, thanks

@skraga
Copy link

skraga commented Apr 30, 2024

We're facing the same issue as previous commenter

@mskanth972
Copy link
Contributor

Apologies for the delay in getting back. Our team is currently addressing this issue and will provide a solution soon. Thank you for your patience.

@skraga
Copy link

skraga commented May 15, 2024

@mskanth972 is there any ETA when it will be available? I see a new 2.0.2 addon released but no option to set privileged to false for the controller.

@mskanth972
Copy link
Contributor

@skraga I have the PR ready, We will merge this and release in the upcoming version with Addons also. ECD will be by END of this Month.

@skraga
Copy link

skraga commented May 20, 2024

@mskanth972 Thanks for your reply. Moreover, when we were considering the EKS addon for our use case we found out that it was not possible to set resource requests and limits.

@z0rc
Copy link

z0rc commented Jun 7, 2024

@mskanth972

I have the #1348 ready, We will merge this

PR is closed without merge and explanation.

ECD will be by END of this Month.

End of May passed, no updates on the issue. Please share what's the current state of this issue and plans to address it.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2024
@z0rc
Copy link

z0rc commented Sep 5, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants