failed to delete pv #192

Madhu-1 · 2019-02-12T07:52:52Z

Describe the bug
failed to delete pv when the pvc got deleted
Kubernetes and Ceph CSI Versions
Kubernetes: 1.13
Ceph CSI: 1.0.0

To Reproduce
create 100 pvc and delete 100 pvc
Expected behavior
all pv should be deleted when pvc is deleted

[🎩︎]mrajanna@localhost ceph $]kube get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                      STORAGECLASS   REASON   AGE
pvc-6247d340-2e96-11e9-af82-525400b639ff   5Gi        RWX            Delete           Released   default/csi-cephfs-pvc93   csi-cephfs              32m
pvc-6276de8e-2e96-11e9-af82-525400b639ff   5Gi        RWX            Delete           Released   default/csi-cephfs-pvc94   csi-cephfs              32m
pvc-62a2be3a-2e96-11e9-af82-525400b639ff   5Gi        RWX            Delete           Released   default/csi-cephfs-pvc95   csi-cephfs              32m
pvc-62cce258-2e96-11e9-af82-525400b639ff   5Gi        RWX            Delete           Released   default/csi-cephfs-pvc96   csi-cephfs              32m
[🎩︎]mrajanna@localhost ceph $]kube get pvc
No resources found.
[🎩︎]mrajanna@localhost ceph $]

[🎩︎]mrajanna@localhost ceph $]kube describe pv/pvc-6247d340-2e96-11e9-af82-525400b639ff
Name:            pvc-6247d340-2e96-11e9-af82-525400b639ff
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: csi-cephfsplugin
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    csi-cephfs
Status:          Released
Claim:           default/csi-cephfs-pvc93
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        5Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            csi-cephfsplugin
    VolumeHandle:      csi-cephfs-pvc-6247d340-2e96-11e9-af82-525400b639ff
    ReadOnly:          false
    VolumeAttributes:      monitors=rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789
                           pool=myfs-data0
                           provisionVolume=true
                           storage.kubernetes.io/csiProvisionerIdentity=1549952614808-8081-
Events:
  Type     Reason              Age                From                                                                                  Message
  ----     ------              ----               ----                                                                                  -------
  Warning  VolumeFailedDelete  28m                csi-cephfsplugin_csi-cephfsplugin-provisioner-0_b709bc9c-2e8f-11e9-a968-0a580ae94212  rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  VolumeFailedDelete  12m (x6 over 27m)  csi-cephfsplugin_csi-cephfsplugin-provisioner-0_b709bc9c-2e8f-11e9-a968-0a580ae94212  rpc error: code = Internal desc = configmaps "csi-cephfs-pvc-6247d340-2e96-11e9-af82-525400b639ff" not found

[🎩︎]mrajanna@localhost ceph $]kube get  cm
No resources found.

The text was updated successfully, but these errors were encountered:

rootfs · 2019-02-12T13:37:42Z

as discussed with @ShyamsundarR, current per PV configmap has a scaling issue. We aim to shift to a per Ceph cluster configmap and embeds the configmap name in the PV name, so controller server can look up the configmap and get ceph config from the configmap.

gman0 · 2019-02-12T13:54:06Z

This might be related to kubernetes-csi/external-provisioner#131: what might be happening is that while one DeleteVolume RPC (A) is still in progress, the 10s timeout is hit and another DeleteVolume RPC (B) is issued. Meanwhile, (A) finishes successfully and deletes the configmap. From (B)'s perspective, it fails to find the CM and returns error - and that's what the external-provisioner ultimately sees. This really is just a guess but nevertheless a plausible one.
@Madhu-1, could you please check if the actual volume directories on ceph cluster are deleted? Output from ceph auth ls should suffice (once the volume is deleted, the ceph user is deleted as well - no volume = no user).

Madhu-1 · 2019-02-12T13:59:02Z

@gman0 I destroyed the setup,but I will try once again, from the logs, it looks like the issue is with the configmap (need to investigate in which case am hitting this issue and also why am getting configmap not found error and when configmap got deleted)

Madhu-1 · 2019-02-12T14:06:31Z

@gman0 what we have to do if the configmap not found. currently we are returning an internal-server error.

gman0 · 2019-02-12T14:12:38Z

@Madhu-1 That's one of the harder problems - there is no way to distinguish between "volume already deleted" and "cannot delete a volume because of missing metadata". See the issue i've linked above.

dllozhkin · 2019-02-14T11:27:20Z

Hello,
I have the same problem, I can't del PV

Events:
  Type     Reason              Age                  From                                                                            Message
  ----     ------              ----                 ----                                                                            -------
  Warning  VolumeFailedDelete  7m10s (x7 over 38m)  csi-rbdplugin_csi-rbdplugin-provisioner-0_a8c39bb5-2dfb-11e9-a5c5-02fa171bccad  rpc error: code = Unknown desc = configmaps "csi-rbd-vol-77234b68-3037-11e9-8f75-0050562c0165" not found

when are you fix that?

gman0 · 2019-02-14T21:14:20Z

Got hit by this in rbdplugin as well.

Some so helpful logs:

I0214 20:41:49.129431       1 utils.go:97] GRPC call: /csi.v1.Controller/DeleteVolume
I0214 20:41:49.129783       1 utils.go:98] GRPC request: {"secrets":"***stripped***","volume_id":"csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"}
I0214 20:41:49.133904       1 hashed.go:47] hashedKeyMutex.LockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"
I0214 20:41:49.133970       1 hashed.go:49] hashedKeyMutex.LockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
I0214 20:41:49.160288       1 controllerserver.go:266] deleting volume pvc-bda0240b-3098-11e9-9a4d-52540013b951
I0214 20:41:49.160557       1 rbd_util.go:166] rbd: status pvc-bda0240b-3098-11e9-9a4d-52540013b951 using mon 192.168.122.15:6789, pool rbd
W0214 20:41:49.279994       1 rbd_util.go:188] rbd: no watchers on pvc-bda0240b-3098-11e9-9a4d-52540013b951
I0214 20:41:49.280017       1 rbd_util.go:213] rbd: rm pvc-bda0240b-3098-11e9-9a4d-52540013b951 using mon 192.168.122.15:6789, pool rbd
I0214 20:42:14.415483       1 utils.go:97] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0214 20:42:14.420742       1 utils.go:98] GRPC request: {}
I0214 20:42:14.426830       1 utils.go:103] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}}]}
I0214 20:42:14.438932       1 utils.go:97] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0214 20:42:14.439447       1 utils.go:98] GRPC request: {}
I0214 20:42:14.443723       1 controllerserver-default.go:62] Using default ControllerGetCapabilities
I0214 20:42:14.443735       1 utils.go:103] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}}]}
I0214 20:42:14.463962       1 utils.go:97] GRPC call: /csi.v1.Identity/GetPluginInfo
I0214 20:42:14.464232       1 utils.go:98] GRPC request: {}
I0214 20:42:14.468072       1 identityserver-default.go:32] Using default GetPluginInfo
I0214 20:42:14.468082       1 utils.go:103] GRPC response: {"name":"csi-rbdplugin","vendor_version":"1.0.0"}
I0214 20:42:15.343733       1 utils.go:97] GRPC call: /csi.v1.Controller/DeleteVolume
I0214 20:42:15.344088       1 utils.go:98] GRPC request: {"secrets":"***stripped***","volume_id":"csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"}
I0214 20:42:15.347633       1 hashed.go:47] hashedKeyMutex.LockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"
I0214 20:42:35.647674       1 k8scmcache.go:177] k8s-cm-cache: successfully deleted metadata configmap csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086
I0214 20:42:35.653706       1 hashed.go:54] hashedKeyMutex.UnlockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"
I0214 20:42:35.653738       1 hashed.go:56] hashedKeyMutex.UnlockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
I0214 20:42:35.653759       1 utils.go:103] GRPC response: {}
I0214 20:42:35.654202       1 hashed.go:49] hashedKeyMutex.LockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
I0214 20:42:35.662184       1 hashed.go:54] hashedKeyMutex.UnlockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"
I0214 20:42:35.662446       1 hashed.go:56] hashedKeyMutex.UnlockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
E0214 20:42:35.667241       1 utils.go:101] GRPC error: configmaps "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" not found
I0214 20:42:56.323471       1 utils.go:97] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0214 20:42:56.323740       1 utils.go:98] GRPC request: {}
I0214 20:42:56.331733       1 utils.go:103] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}}]}
I0214 20:42:56.342643       1 utils.go:97] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0214 20:42:56.342822       1 utils.go:98] GRPC request: {}
I0214 20:42:56.345258       1 controllerserver-default.go:62] Using default ControllerGetCapabilities
I0214 20:42:56.345269       1 utils.go:103] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}}]}
I0214 20:42:56.366441       1 utils.go:97] GRPC call: /csi.v1.Identity/GetPluginInfo
I0214 20:42:56.370258       1 utils.go:98] GRPC request: {}
I0214 20:42:56.370936       1 identityserver-default.go:32] Using default GetPluginInfo
I0214 20:42:56.370945       1 utils.go:103] GRPC response: {"name":"csi-rbdplugin","vendor_version":"1.0.0"}
I0214 20:42:57.348685       1 utils.go:97] GRPC call: /csi.v1.Controller/DeleteVolume
I0214 20:42:57.354895       1 utils.go:98] GRPC request: {"secrets":"***stripped***","volume_id":"csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"}
I0214 20:42:57.355676       1 hashed.go:47] hashedKeyMutex.LockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"
I0214 20:42:57.358931       1 hashed.go:49] hashedKeyMutex.LockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
I0214 20:42:57.898229       1 hashed.go:54] hashedKeyMutex.UnlockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"
I0214 20:42:57.898596       1 hashed.go:56] hashedKeyMutex.UnlockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
E0214 20:42:57.902772       1 utils.go:101] GRPC error: configmaps "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" not found

but on the ceph cluster:

$ rbd ls

so the image is actually deleted, but external-provisioner ignores the success (kubernetes-csi/external-provisioner#131).

Looking at the logs, the first attempt (A) to delete takes too long to finish so the external-provisioner issues a new DeleteVolume RPC (B):

I0214 20:42:15.343733       1 utils.go:97] GRPC call: /csi.v1.Controller/DeleteVolume

(A) holds the lock though, so even though (B) is the "active" RPC (which will have the definitive answer over the result of the operation), it has to wait till (A) finishes:

I0214 20:42:15.343733       1 utils.go:97] GRPC call: /csi.v1.Controller/DeleteVolume
I0214 20:42:15.344088       1 utils.go:98] GRPC request: {"secrets":"***stripped***","volume_id":"csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"}
I0214 20:42:15.347633       1 hashed.go:47] hashedKeyMutex.LockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"

Once (A) is successfully done (along with the deletion of the image AND the metadata stored in configmap), it releases the lock, so that (B) may continue on its merry way:

I0214 20:42:35.647674       1 k8scmcache.go:177] k8s-cm-cache: successfully deleted metadata configmap csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086
I0214 20:42:35.653706       1 hashed.go:54] hashedKeyMutex.UnlockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"
I0214 20:42:35.653738       1 hashed.go:56] hashedKeyMutex.UnlockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
I0214 20:42:35.653759       1 utils.go:103] GRPC response: {}
I0214 20:42:35.654202       1 hashed.go:49] hashedKeyMutex.LockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
I0214 20:42:35.662184       1 hashed.go:54] hashedKeyMutex.UnlockKey(...) called for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086"

Since the metadata is now gone, (B) immediately exits because it cannot continue without it:

I0214 20:42:35.662446       1 hashed.go:56] hashedKeyMutex.UnlockKey(...) for id "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" completed.
E0214 20:42:35.667241       1 utils.go:101] GRPC error: configmaps "csi-rbd-vol-beb98884-3098-11e9-8e07-0a580af40086" not found

And that's what external-provisioner sees. The most recent "active" DeleteVolume RPC returned error - that's the source of truth here. Regardless of the fact that (A) was successful in the end.

I'll create a new issue tomorrow for possible solutions of this situation.

Madhu-1 · 2019-02-15T08:07:03Z

@gman0 @rootfs don't we need to continue for volume deletion if config map not found?
as CSI spec says all operations should be idempotent if the resource doesn't exist anymore.

Madhu-1 · 2019-03-05T09:53:03Z

tested and not reproducible now ,closing this one

gman0 mentioned this issue Feb 15, 2019

Cleaning up resources #202

Closed

Madhu-1 closed this as completed Mar 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to delete pv #192

failed to delete pv #192

Madhu-1 commented Feb 12, 2019

rootfs commented Feb 12, 2019

gman0 commented Feb 12, 2019

Madhu-1 commented Feb 12, 2019 •

edited

Loading

Madhu-1 commented Feb 12, 2019

gman0 commented Feb 12, 2019 •

edited

Loading

dllozhkin commented Feb 14, 2019

gman0 commented Feb 14, 2019 •

edited

Loading

Madhu-1 commented Feb 15, 2019

Madhu-1 commented Mar 5, 2019

failed to delete pv #192

failed to delete pv #192

Comments

Madhu-1 commented Feb 12, 2019

rootfs commented Feb 12, 2019

gman0 commented Feb 12, 2019

Madhu-1 commented Feb 12, 2019 • edited Loading

Madhu-1 commented Feb 12, 2019

gman0 commented Feb 12, 2019 • edited Loading

dllozhkin commented Feb 14, 2019

gman0 commented Feb 14, 2019 • edited Loading

Madhu-1 commented Feb 15, 2019

Madhu-1 commented Mar 5, 2019

Madhu-1 commented Feb 12, 2019 •

edited

Loading

gman0 commented Feb 12, 2019 •

edited

Loading

gman0 commented Feb 14, 2019 •

edited

Loading