[RBAC] Helm Chart 3.10.0: csi-rbdplugin container enters CrashLoopBackoff with failed to get node error #4306

remisauvat · 2023-12-06T16:52:29Z

Describe the bug

Hello,
After upgrading the helm chart from 3.9.0 to 3.10.0, the container csi-rbdplugin crashes in a loop with a permissions error to fetch the nodes resources.

F1206 16:31:23.921143       1 driver.go:131] failed to get node "xxxxxxxx" information: nodes "xxxxxxxx" is forbidden: User "system:serviceaccount:ceph-csi-rbd:cph-cs-rbd-ceph-csi-rbd-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

This is probably related to changes from #4165, but the RBAC rules for the service account do not match the new requirement to fetch the nodes labels.

Environment details

Image/version of Ceph CSI driver : 3.10.0
Helm chart version : 3.10.0
Kernel version : 5.15.0-89-generic
Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
krbd or rbd-nbd) :
Kubernetes cluster version : 1.27.3
Ceph cluster version :

Steps to reproduce

Steps to reproduce the behavior:

Deploy helm chart ceph-csi/ceph-csi-rbd with version 3.10.0.
Make sure rbac.create: true is set in values.yaml

Actual results

Pod csi-rbd-provisioner is in CrashLoopBackup state due to failure in csi-rbdplugin container

Expected behavior

The container and pod should not crash

Logs

csi-rbdplugin:

I1206 16:31:23.907599       1 cephcsi.go:191] Driver version: v3.10.0 and Git version: 24ae2a7a062b3e58746bb9cc6d5737e37a7e771c
I1206 16:31:23.907720       1 cephcsi.go:223] Starting driver type: rbd with name: rbd.csi.ceph.com
I1206 16:31:23.907750       1 driver.go:94] Enabling controller service capability: CREATE_DELETE_VOLUME
I1206 16:31:23.907755       1 driver.go:94] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I1206 16:31:23.907761       1 driver.go:94] Enabling controller service capability: CLONE_VOLUME
I1206 16:31:23.907765       1 driver.go:94] Enabling controller service capability: EXPAND_VOLUME
I1206 16:31:23.907770       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_WRITER
I1206 16:31:23.907774       1 driver.go:107] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I1206 16:31:23.907778       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I1206 16:31:23.907918       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
F1206 16:31:23.921143       1 driver.go:131] failed to get node "xxxxxxx" information: nodes "xxxxxxx" is forbidden: User "system:serviceaccount:ceph-csi-rbd:cph-cs-rbd-ceph-csi-rbd-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

Additional context

I don't think it's the same issue as #4298. Setting readAffinity.enabled to true or false doesn't change the issue.

The text was updated successfully, but these errors were encountered:

Rakshith-R · 2023-12-07T10:20:43Z

@remisauvat The linked pr should solve the issue. We'll include the fix in v3.10.1 soon.

As a workaround, can you please add required clusterrole in provisioner rbac too ?

remisauvat · 2023-12-07T11:10:55Z

The linked PR will not solve the issue because it only adds get nodes permission to the nodeplugin clusterrole.

The provisioner service account is linked only to a Role which cannot get nodes. So I think there is a need to create a new clusterrole for the provisioner to allow get nodes.

I manually patched a clusterrole for the provisioner and also patched for #4297 and now it works. I am in a lab cluster so I will revert to v3.9.0 and wait for a chart version that can fix this. I am sorry I am not able to provide a PR for this.

Rakshith-R · 2023-12-07T11:16:48Z

The linked PR will not solve the issue because it only adds get nodes permission to the nodeplugin clusterrole.

The provisioner service account is linked only to a Role which cannot get nodes. So I think there is a need to create a new clusterrole for the provisioner to allow get nodes.

I manually patched a clusterrole for the provisioner and also patched for #4297 and now it works. I am in a lab cluster so I will revert to v3.9.0 and wait for a chart version that can fix this. I am sorry I am not able to provide a PR for this.

The latest commit in that pr moves the code to run only in nodeserver which will solve the issue.

remisauvat · 2023-12-07T11:18:28Z

Oh I didn't get that. Then you are right it should solve the issue.
I will wait for 3.10.1 to test it.

Thank you

XtremeOwnageDotCom · 2023-12-08T13:55:30Z

Here is a single line command to fix the issue for now

kubectl patch ClusterRole ceph-csi-rbd-nodeplugin --type=json -p='[{"op":"add","path":"/rules/-","value":{"apiGroups":[""],"resources":["nodes"],"verbs":["get","list","watch"]}}]'

#4302 will fix this.

Rakshith-R linked a pull request Dec 7, 2023 that will close this issue

added permission to get nodes for rbd #4302

Merged

2 tasks

mergify bot closed this as completed in #4302 Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RBAC] Helm Chart 3.10.0: csi-rbdplugin container enters CrashLoopBackoff with failed to get node error #4306

[RBAC] Helm Chart 3.10.0: csi-rbdplugin container enters CrashLoopBackoff with failed to get node error #4306

remisauvat commented Dec 6, 2023

Rakshith-R commented Dec 7, 2023

remisauvat commented Dec 7, 2023

Rakshith-R commented Dec 7, 2023

remisauvat commented Dec 7, 2023

XtremeOwnageDotCom commented Dec 8, 2023 •

edited

Loading

[RBAC] Helm Chart 3.10.0: csi-rbdplugin container enters CrashLoopBackoff with failed to get node error #4306

[RBAC] Helm Chart 3.10.0: csi-rbdplugin container enters CrashLoopBackoff with failed to get node error #4306

Comments

remisauvat commented Dec 6, 2023

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Logs

Additional context

Rakshith-R commented Dec 7, 2023

remisauvat commented Dec 7, 2023

Rakshith-R commented Dec 7, 2023

remisauvat commented Dec 7, 2023

XtremeOwnageDotCom commented Dec 8, 2023 • edited Loading

XtremeOwnageDotCom commented Dec 8, 2023 •

edited

Loading