Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create pvc using cephfs #3730

Closed
franitel opened this issue Mar 29, 2023 · 12 comments
Closed

Unable to create pvc using cephfs #3730

franitel opened this issue Mar 29, 2023 · 12 comments
Labels
component/cephfs Issues related to CephFS component/deployment Helm chart, kubernetes templates and configuration Issues/PRs question Further information is requested wontfix This will not be worked on

Comments

@franitel
Copy link

Describe the bug

We have a cephfs cluster and we have deployed the ceph-csi-cephfs chart in our kubernetes cluster (v1.19.9) with the following values:

USER-SUPPLIED VALUES:
csiConfig:
- cephFS:
    subvolumeGroup: csi
  clusterID: 8fxxxxxxxxxxxxxxxxxxxxxxxa0
  monitors:
  - 172.22.14.201:6789
  - 172.22.14.202:6789
  - 172.22.14.203:6789
provisioner:
  replicaCount: 1
secret:
  adminID: admin
  adminKey: AQCccccccccccccccccccccc==
  create: true
storageClass:
  allowVolumeExpansion: true
  clusterID: 8fxxxxxxxxxxxxxxxxxxxxxxxa0
  create: true
  fsName: k8sfs
  name: csi-cephfs-sc
  reclaimPolicy: Delete

we have checked and we can mount the volume in the kubernetes nodes using the following command:

mount -v -t ceph -o name=admin,secretfile=ceph.keyring 172.22.14.203:6789:/k8scephfs /srv/cephfs3/

but when we try to deploy a pvc we see the pvc in pending mode showing the following status:

image

This is the ceph status:

image

image

here you can see the logs (ceph.audit.log) when we try deploy the pvc.

image

A clear and concise description of what the bug is.

Environment details

  • Image/version of Ceph CSI driver : quay.io/cephcsi/cephcsi:v3.8.0 first we were checking with v3.4.0 with the same result

  • Helm chart version : ceph-csi/ceph-csi-cephfs 3.8.0

  • Kernel version : Linux vm-k8s-test-worker-1 4.15.0-109-generic rbd: refuse to create block volumes #110-Ubuntu SMP Tue Jun 23 02:39:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) :

  • Kubernetes cluster version :

  • Ceph cluster version :

Steps to reproduce

Steps to reproduce the behavior:

  1. Deploy the chart with the previos values
  2. Deployment pvc and statefulset
  3. See error: the pvc it is in pending status without the bonding

Actual results

When we try to deploy a pvc it is getting in pending status.

Expected behavior

Run the pod with the pvc mounted inside.

Logs

  • csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from
    plugin pod from the node where the mount is failing.
I0329 10:59:56.394232       1 utils.go:195] ID: 26 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 GRPC call: /csi.v1.Controller/CreateVolume
I0329 10:59:56.394554       1 utils.go:206] ID: 26 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424","parameters":{"clusterID":"8f7b6e44-5b7a-48f7-83c5-dd83fb0b7ea0","csi.storage.k8s.io/pv/name":"pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424","csi.storage.k8s.io/pvc/name":"www-web-0","csi.storage.k8s.io/pvc/namespace":"storage","fsName":"k8s_fs"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":7}}]}
E0329 10:59:56.394667       1 controllerserver.go:269] ID: 26 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 already exists
E0329 10:59:56.394697       1 utils.go:210] ID: 26 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 already exists
I0329 11:00:28.774657       1 utils.go:195] ID: 27 GRPC call: /csi.v1.Identity/Probe
I0329 11:00:28.774902       1 utils.go:206] ID: 27 GRPC request: {}
I0329 11:00:28.774974       1 utils.go:212] ID: 27 GRPC response: {}
I0329 11:01:28.749331       1 utils.go:195] ID: 28 GRPC call: /csi.v1.Identity/Probe
I0329 11:01:28.751994       1 utils.go:206] ID: 28 GRPC request: {}
I0329 11:01:28.752024       1 utils.go:212] ID: 28 GRPC response: {}
I0329 11:02:04.408556       1 utils.go:195] ID: 29 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 GRPC call: /csi.v1.Controller/CreateVolume
I0329 11:02:04.408948       1 utils.go:206] ID: 29 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424","parameters":{"clusterID":"8f7b6e44-5b7a-48f7-83c5-dd83fb0b7ea0","csi.storage.k8s.io/pv/name":"pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424","csi.storage.k8s.io/pvc/name":"www-web-0","csi.storage.k8s.io/pvc/namespace":"storage","fsName":"k8s_fs"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":7}}]}
E0329 11:02:04.409167       1 controllerserver.go:269] ID: 29 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 already exists
E0329 11:02:04.409292       1 utils.go:210] ID: 29 Req-ID: pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 already exists

If the issue is in PVC creation, deletion, cloning please attach complete logs
of below containers.

  • csi-provisioner and csi-rbdplugin/csi-cephfsplugin container logs from the
    provisioner pod.
I0329 10:54:48.792549       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage", Name:"www-web-0", UID:"7edca0a0-
f18f-4de1-9ba9-73bf239ab424", APIVersion:"v1", ResourceVersion:"128546450", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner i
s provisioning volume for claim "storage/www-web-0"
W0329 10:57:48.792937       1 controller.go:934] Retrying syncing claim "7edca0a0-f18f-4de1-9ba9-73bf239ab424", failure 0
E0329 10:57:48.793184       1 controller.go:957] error syncing claim "7edca0a0-f18f-4de1-9ba9-73bf239ab424": failed to provision volume with StorageClas
s "csi-cephfs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0329 10:57:48.793844       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage", Name:"www-web-0", UID:"7edca0a0-
f18f-4de1-9ba9-73bf239ab424", APIVersion:"v1", ResourceVersion:"128546450", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provi
sion volume with StorageClass "csi-cephfs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0329 10:57:49.293732       1 controller.go:1337] provision "storage/www-web-0" class "csi-cephfs-sc": started
I0329 10:57:49.294592       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage", Name:"www-web-0", UID:"7edca0a0-
f18f-4de1-9ba9-73bf239ab424", APIVersion:"v1", ResourceVersion:"128546450", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner i
s provisioning volume for claim "storage/www-web-0"
W0329 10:57:49.303371       1 controller.go:934] Retrying syncing claim "7edca0a0-f18f-4de1-9ba9-73bf239ab424", failure 1
E0329 10:57:49.303401       1 controller.go:957] error syncing claim "7edca0a0-f18f-4de1-9ba9-73bf239ab424": failed to provision volume with StorageClas
s "csi-cephfs-sc": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 already exists
I0329 10:57:49.303416       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage", Name:"www-web-0", UID:"7edca0a0-
f18f-4de1-9ba9-73bf239ab424", APIVersion:"v1", ResourceVersion:"128546450", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provi
sion volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf23
9ab424 already exists
I0329 10:57:50.303664       1 controller.go:1337] provision "storage/www-web-0" class "csi-cephfs-sc": started
I0329 10:57:50.303858       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage", Name:"www-web-0", UID:"7edca0a0-
f18f-4de1-9ba9-73bf239ab424", APIVersion:"v1", ResourceVersion:"128546450", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner i
s provisioning volume for claim "storage/www-web-0"If the issue is in PVC mounting please attach complete logs of below containers.


- if required attach dmesg logs.

**Note:-** If its a rbd issue please provide only rbd related logs, if its a
cephFS issue please provide cephFS logs.

# Additional context #

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

W0329 10:57:50.322911       1 controller.go:934] Retrying syncing claim "7edca0a0-f18f-4de1-9ba9-73bf239ab424", failure 2
E0329 10:57:50.323268       1 controller.go:957] error syncing claim "7edca0a0-f18f-4de1-9ba9-73bf239ab424": failed to provision volume with StorageClas
s "csi-cephfs-sc": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf239ab424 already exists
I0329 10:57:50.323380       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage", Name:"www-web-0", UID:"7edca0a0-
f18f-4de1-9ba9-73bf239ab424", APIVersion:"v1", ResourceVersion:"128546450", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provi
sion volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-7edca0a0-f18f-4de1-9ba9-73bf23
9ab424 already exists

If need it some other data, please only tell me.

Francisco Rodriguez

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 29, 2023

@franitel please check if its a network connectivity issue or not, https://rook.io/docs/rook/latest/Troubleshooting/ceph-csi-common-issues/#ceph-health can help you with the steps for debugging.

@franitel
Copy link
Author

Hi, Madhu,
I'm going to check.
thanks for your quick response!!

@ppodevlabs
Copy link

@franitel please check if its a network connectivity issue or not, https://rook.io/docs/rook/latest/Troubleshooting/ceph-csi-common-issues/#ceph-health can help you with the steps for debugging.

We have check this before and it seems to be working fine

root@ceph-mon-test1:/home/pedrop# ceph health detail
HEALTH_OK
❯ k exec -it -n storage ceph-csi-driver-ceph-csi-cephfs-provisioner-7f84fc97fb-j9rwh -c csi-cephfsplugin -- /bin/bash
[root@vm-k8s-test-worker-2 /]# curl curl 172.22.14.201:3300 2>/dev/null
ceph v2

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 29, 2023

monitors:

  • 172.22.14.201:6789
  • 172.22.14.202:6789
  • 172.22.14.203:6789

Did you check for the 6789 port as well?

@ppodevlabs
Copy link

monitors:

  • 172.22.14.201:6789
  • 172.22.14.202:6789
  • 172.22.14.203:6789

Did you check for the 6789 port as well?

yes

root@vm-k8s-test-worker-2:/home/pedrop# nc -zv 172.22.14.201 6789
Connection to 172.22.14.201 6789 port [tcp/*] succeeded!
❯ k exec -it -n storage ceph-csi-driver-ceph-csi-cephfs-provisioner-7f84fc97fb-j9rwh -c csi-cephfsplugin -- /bin/bash
[root@vm-k8s-test-worker-2 /]# curl curl 172.22.14.201:6789 2>/dev/null
[root@vm-k8s-test-worker-2 /]#

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 29, 2023

@ppodevlab are you able to execute ceph commands from the container? see https://www.mrajanna.com/troubleshooting-cephcsi/ might help you

@ppodevlabs
Copy link

ppodevlabs commented Mar 29, 2023

@Madhu-1 i've managed to execute commands from the plugin container but i have to specify the user/keyring. Could be the case the provisioner is not fetching properly the configuration?

k exec -it -n storage ceph-csi-driver-ceph-csi-cephfs-provisioner-9584bc97-j6vgq -c csi-cephfsplugin -- /bin/bash
[root@vm-k8s-test-worker-2 /]# ceph status --user=admin --key=my_key
  cluster:
    id:     8f7b6e44-5b7a-48f7-83c5-dd83fb0b7ea0
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph-mon-test2,ceph-mon-test3,ceph-mon-test1 (age 6h)
    mgr: ceph-mon-test3(active, since 5d), standbys: ceph-mon-test2, ceph-mon-test1
    mds: k8s_fs:1 {0=ceph-mon-test1=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 5d), 6 in (since 8d)
    rgw: 3 daemons active (radosgw.ceph-mon-test1, radosgw.ceph-mon-test2, radosgw.ceph-mon-test3)

  task status:

  data:
    pools:   21 pools, 337 pgs
    objects: 564 objects, 914 MiB
    usage:   8.7 GiB used, 51 GiB / 60 GiB avail
    pgs:     337 active+clean

keyring file in /etc/ceph/keyring is empty

@ppodevlabs
Copy link

@ppodevlab are you able to execute ceph commands from the container? see https://www.mrajanna.com/troubleshooting-cephcsi/ might help you

we just did a test creating static pvcs following https://github.com/ceph/ceph-csi/blob/devel/docs/static-pvc.md#cephfs-static-pvc. We created the volume and volumegroup from the plugin container within the provisioner pod and we can mount static volumes into pods... So i do not think it is a network issue.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 30, 2023

@ppodevlab am not sure what the problem is with your setup, one thing is you need to pass the monitor,user and key mentioned in the storageclass and configmap when you are executing the commands from the provisioner pod because those are not available by default on it

@nixpanic nixpanic added component/cephfs Issues related to CephFS component/deployment Helm chart, kubernetes templates and configuration Issues/PRs labels Mar 31, 2023
@nixpanic
Copy link
Member

Kubernetes v1.19.9 is rather old and unmaintained. We do not test recent Ceph-CSI versions against that version anyore, possibly something broke and recent kubernretes-csi sidecars are not compatible with the old version?

@nixpanic nixpanic added the question Further information is requested label Mar 31, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Apr 30, 2023
@github-actions
Copy link

github-actions bot commented May 7, 2023

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS component/deployment Helm chart, kubernetes templates and configuration Issues/PRs question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants