Kubernetes V1.20 Cannot Dynamic Provision #183

toneill818 · 2021-06-03T17:39:29Z

/kind bug

What happened?
Error getting claim reference: selfLink was empty. This was removed in 1.20.

What you expected to happen?
Dynamically provision FSX lustre share.

How to reproduce it (as minimally and precisely as possible)?
Follow dynamic provisioning

Anything else we need to know?:
E0603 17:17:29.100846 1 controller.go:1213] provision "X/Y" class "aws-fsx": unexpected error getting claim reference: selfLink was empty, can't make reference

Environment

Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.4-eks-6b7464", GitCommit:"6b746440c04cb81db4426842b4ae65c3f7035e53", GitTreeState:"clean", BuildDate:"2021-03-19T19:33:03Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Driver version: v0.4.0

wongma7 · 2021-06-11T18:30:54Z

yes, we are still shipping with the external-provisioner v1.3.0 sidecar, https://github.com/kubernetes-sigs/aws-fsx-csi-driver/blob/master/helm/values.yaml#L37 need to upgrade it to something newer https://kubernetes-csi.github.io/docs/external-provisioner.html

jihed · 2021-06-18T10:34:42Z

@wongma7 I am having the same issue with csi-provisioner:v2.1.1, aws-fsx-csi-driver:v0.4.0, csi-node-driver-registrar:v2.1.0 on EKS-k8s 1.20.

jdu · 2021-07-07T08:15:47Z

Is there a replacement for this for EKS 1.20+? We're using managed node groups which I don't think allow us to specify the flag to re-enable the underlying feature to get this working and we use it for SCRATCH volumes against S3 to power a number of things, but it seems to be completely broken in 1.20 on AWS.

Are there any workarounds for a managed node group in EKS?

jefflantz · 2021-07-27T20:40:31Z

I'm also running into this issue. To add a bit more detail to anyone who runs into this, the error message shows up when you run

kubectl logs fsx-csi-controller-<rest of name> -n kube-system -c csi-provisioner

This may be the issue you're running into if, after setting everything up, your PersistentVolumeClaim never stops showing pending as its status, constantly giving "ExternalProvisioning, waiting for a volume to be created, either by external provisioner 'fsx.csi.aws.com' or manually created by system administrator".

I'm not using a managed nodegroup, but I'm not sure what @wongma7 means by using the external-provisioner sidecar, or how I would set that up. Is that all I would have to do to get a self-managed nodegroup functional with this?

Also, although I like the idea of dynamically creating the FSx filesystem, it's not a necessary feature for me, and I could get by with just connecting to an existing FSx system, but I've found pitifully little documentation on how to do that with Kubernetes, does anyone have any links/resources?

wongma7 · 2021-07-27T21:09:36Z

The flag is not on the node group, so you don't need to worry about being managed or unmanaged, the fix is in the YAMLs that get deployed to the cluster when you "install" the driver.
You must kubectl edit fsx-csi-controller- -n kube-system such taht the image line that contains

csi-provisioner:v1.3.0
instead says
csi-provisioner:v2.1.1

However, someone reported above that even this doesn't work, so I can't guarantee this will fix the issue, I haven't had a chance to verify it myself.

I understand the confusion, there's a lot of moving parts, and when the instructions we do offer don't work it's hard to debug. I am working on a release that should include the fix in it such that you don't have to edit any YAMLs, I'll update this issue when it is done.

jefflantz · 2021-07-28T01:05:50Z

Hi, thanks for your prompt response. I tried

kubectl edit rs fsx-csi-controller -n kube-system

making the change you suggested, but I'm still getting that same error in the logs of csi-provisioner that "unexpected error getting claim reference: selfLink was empty, can't make reference". However, I am deploying everything using terraform and could've made a mistake somewhere along the way, so I will try again in the future using eksctl, aws CLI, and kubectl instead and update accordingly.

wongma7 · 2021-08-05T21:26:19Z

helm chart 1.0 has just been released https://github.com/kubernetes-sigs/aws-fsx-csi-driver#upgrading-from-version-0x-to-1x-of-the-helm-chart and it contains the suggested change (a newer csi-provisioner image). I am sure that the change is necessary for fixing this issue https://github.com/kubernetes-csi/external-provisioner/blob/213cd3d4e56fb439b06922ecf85d230a99d4e70d/CHANGELOG/CHANGELOG-1.4.md#bug-fixes but it seems like it's not sufficient. Currently our CI is testing kubernetes 1.20 but it's kops, there might be something different about EKS, will try to reproduce and update with result.

k8s-triage-robot · 2021-11-03T22:12:04Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-12-03T23:10:27Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-01-02T23:13:51Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-01-02T23:14:01Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 3, 2021

wongma7 mentioned this issue Jun 11, 2021

Run/fix DynamicPV tests in CI & remove redundant tests #189

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 3, 2021

k8s-ci-robot closed this as completed Jan 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes V1.20 Cannot Dynamic Provision #183

Kubernetes V1.20 Cannot Dynamic Provision #183

toneill818 commented Jun 3, 2021

wongma7 commented Jun 11, 2021

jihed commented Jun 18, 2021

jdu commented Jul 7, 2021

jefflantz commented Jul 27, 2021 •

edited

Loading

wongma7 commented Jul 27, 2021

jefflantz commented Jul 28, 2021

wongma7 commented Aug 5, 2021

k8s-triage-robot commented Nov 3, 2021

k8s-triage-robot commented Dec 3, 2021

k8s-triage-robot commented Jan 2, 2022

k8s-ci-robot commented Jan 2, 2022

Kubernetes V1.20 Cannot Dynamic Provision #183

Kubernetes V1.20 Cannot Dynamic Provision #183

Comments

toneill818 commented Jun 3, 2021

wongma7 commented Jun 11, 2021

jihed commented Jun 18, 2021

jdu commented Jul 7, 2021

jefflantz commented Jul 27, 2021 • edited Loading

wongma7 commented Jul 27, 2021

jefflantz commented Jul 28, 2021

wongma7 commented Aug 5, 2021

k8s-triage-robot commented Nov 3, 2021

k8s-triage-robot commented Dec 3, 2021

k8s-triage-robot commented Jan 2, 2022

k8s-ci-robot commented Jan 2, 2022

jefflantz commented Jul 27, 2021 •

edited

Loading