Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes V1.20 Cannot Dynamic Provision #183

Closed
toneill818 opened this issue Jun 3, 2021 · 11 comments
Closed

Kubernetes V1.20 Cannot Dynamic Provision #183

toneill818 opened this issue Jun 3, 2021 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@toneill818
Copy link

/kind bug

What happened?
Error getting claim reference: selfLink was empty. This was removed in 1.20.

What you expected to happen?
Dynamically provision FSX lustre share.

How to reproduce it (as minimally and precisely as possible)?
Follow dynamic provisioning

Anything else we need to know?:
E0603 17:17:29.100846 1 controller.go:1213] provision "X/Y" class "aws-fsx": unexpected error getting claim reference: selfLink was empty, can't make reference

Environment

  • Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.4-eks-6b7464", GitCommit:"6b746440c04cb81db4426842b4ae65c3f7035e53", GitTreeState:"clean", BuildDate:"2021-03-19T19:33:03Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

  • Driver version: v0.4.0

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 3, 2021
@wongma7
Copy link
Contributor

wongma7 commented Jun 11, 2021

yes, we are still shipping with the external-provisioner v1.3.0 sidecar, https://github.com/kubernetes-sigs/aws-fsx-csi-driver/blob/master/helm/values.yaml#L37 need to upgrade it to something newer https://kubernetes-csi.github.io/docs/external-provisioner.html

@jihed
Copy link

jihed commented Jun 18, 2021

@wongma7 I am having the same issue with csi-provisioner:v2.1.1, aws-fsx-csi-driver:v0.4.0, csi-node-driver-registrar:v2.1.0 on EKS-k8s 1.20.

@jdu
Copy link

jdu commented Jul 7, 2021

Is there a replacement for this for EKS 1.20+? We're using managed node groups which I don't think allow us to specify the flag to re-enable the underlying feature to get this working and we use it for SCRATCH volumes against S3 to power a number of things, but it seems to be completely broken in 1.20 on AWS.

Are there any workarounds for a managed node group in EKS?

@jefflantz
Copy link

jefflantz commented Jul 27, 2021

I'm also running into this issue. To add a bit more detail to anyone who runs into this, the error message shows up when you run

kubectl logs fsx-csi-controller-<rest of name> -n kube-system -c csi-provisioner

This may be the issue you're running into if, after setting everything up, your PersistentVolumeClaim never stops showing pending as its status, constantly giving "ExternalProvisioning, waiting for a volume to be created, either by external provisioner 'fsx.csi.aws.com' or manually created by system administrator".

I'm not using a managed nodegroup, but I'm not sure what @wongma7 means by using the external-provisioner sidecar, or how I would set that up. Is that all I would have to do to get a self-managed nodegroup functional with this?

Also, although I like the idea of dynamically creating the FSx filesystem, it's not a necessary feature for me, and I could get by with just connecting to an existing FSx system, but I've found pitifully little documentation on how to do that with Kubernetes, does anyone have any links/resources?

@wongma7
Copy link
Contributor

wongma7 commented Jul 27, 2021

The flag is not on the node group, so you don't need to worry about being managed or unmanaged, the fix is in the YAMLs that get deployed to the cluster when you "install" the driver.
You must kubectl edit fsx-csi-controller- -n kube-system such taht the image line that contains

csi-provisioner:v1.3.0
instead says
csi-provisioner:v2.1.1

However, someone reported above that even this doesn't work, so I can't guarantee this will fix the issue, I haven't had a chance to verify it myself.

I understand the confusion, there's a lot of moving parts, and when the instructions we do offer don't work it's hard to debug. I am working on a release that should include the fix in it such that you don't have to edit any YAMLs, I'll update this issue when it is done.

@jefflantz
Copy link

Hi, thanks for your prompt response. I tried

kubectl edit rs fsx-csi-controller -n kube-system

making the change you suggested, but I'm still getting that same error in the logs of csi-provisioner that "unexpected error getting claim reference: selfLink was empty, can't make reference". However, I am deploying everything using terraform and could've made a mistake somewhere along the way, so I will try again in the future using eksctl, aws CLI, and kubectl instead and update accordingly.

@wongma7
Copy link
Contributor

wongma7 commented Aug 5, 2021

helm chart 1.0 has just been released https://github.com/kubernetes-sigs/aws-fsx-csi-driver#upgrading-from-version-0x-to-1x-of-the-helm-chart and it contains the suggested change (a newer csi-provisioner image). I am sure that the change is necessary for fixing this issue https://github.com/kubernetes-csi/external-provisioner/blob/213cd3d4e56fb439b06922ecf85d230a99d4e70d/CHANGELOG/CHANGELOG-1.4.md#bug-fixes but it seems like it's not sufficient. Currently our CI is testing kubernetes 1.20 but it's kops, there might be something different about EKS, will try to reproduce and update with result.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 3, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants