Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubevirt failed to encrypt storage classes using ceph-csi rbd #3945

Closed
am6737 opened this issue Jun 26, 2023 · 7 comments · Fixed by #3958
Closed

kubevirt failed to encrypt storage classes using ceph-csi rbd #3945

am6737 opened this issue Jun 26, 2023 · 7 comments · Fixed by #3958
Assignees
Labels
component/rbd Issues related to RBD question Further information is requested

Comments

@am6737
Copy link

am6737 commented Jun 26, 2023

Describe the bug

When using an encrypted StorageClass (SC) to create an RBD device with ceph-csi and using it in a kubevirt virtual machine, the following error occurs during volume mapping:

Warning FailedMapVolume 64s (x16 over 18m) kubelet MapVolume.SetUpDevice failed for volume "pvc-275597f6-633b-45e8-b2d8-043732a0c7f8" : rpc error: code = Internal desc = rpc error: code = Internal desc = need resize check failed on devicePath /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-e03af360-21b6-4b87-ad93-e3c9ef1aa91b and stagingPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8/0001-0024-xxx-0000000000000002-e03af360-21b6-4b87-ad93-e3c9ef1aa91b, error: Could not parse fs info on given filesystem format: unknown data, probably partitions. Supported fs types are: xfs, ext3, ext4
Warning FailedMount 26s kubelet Unable to attach or mount volumes: unmounted volumes=[vm-sdisk1], unattached volumes=[public ephemeral-disks vm-sdisk1 container-disks libvirt-runtime sockets hotplug-disks private]: timed out waiting for the condition

Environment details

  • Image/version of Ceph CSI driver : Latest devel Branch
  • Helm chart version :
  • Kernel version : Linux 5.4.0-152-generic
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) :
  • Kubernetes cluster version : v1.26.0
  • Ceph cluster version : v17.2.6
  • kubevirt version: v0.59

Steps to reproduce

Steps to reproduce the behavior:

  1. Configure an encryption storage class
    user-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: storage-encryption-secret
stringData:
  encryptionPassphrase: test-encryption

kms-config.yaml

apiVersion: v1
kind: ConfigMap
data:
config.json: |-
    {
      "secrets-metadata-test": {
          "encryptionKMSType": "metadata"
      },
      "user-ns-secrets-metadata-test": {
        "encryptionKMSType": "metadata",
        "secretName": "storage-encryption-secret",
        "secretNamespace": "default"
      },
      "user-secrets-metadata-test": {
        "encryptionKMSType": "metadata",
        "secretName": "storage-encryption-secret"
      }
    }
metadata:
  name: ceph-csi-encryption-kms-config

stroageclass.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-cephrbd-sc-luks
provisioner: rbd.csi.ceph.com
parameters:
  clusterID: xxx
  pool: xxx     
  imageFeatures: layering
  csi.storage.k8s.io/controller-expand-secret-name: csi-ceph-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: default
  csi.storage.k8s.io/node-stage-secret-name: csi-ceph-secret
  csi.storage.k8s.io/node-stage-secret-namespace: default
  csi.storage.k8s.io/provisioner-secret-name: csi-ceph-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/fstype: ext4
  encrypted: "true"
  encryptionKMSID: "user-secrets-metadata-test"
  encryptionType: "block"
reclaimPolicy: Delete
allowVolumeExpansion: true

2.Configure virtual machine files

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: test-vm
spec:
  running: true
  template:
    spec:
      domain:
        resources:
          requests:
            memory: 512M
        devices:
          disks:
          - disk: 
              bus: virtio
            name: vm-sdisk1
          - disk:
              bus: virtio
            name: cloudinit-disk1
          interfaces:
          - name: default
            masquerade: {}
      networks:
      - name: default
        pod: {}
      volumes:
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              user: root
              password: root
              chpasswd: { expire: False }
          name: cloudinit-disk1
        - dataVolume:
            name: test-vm-pvc
          name: vm-sdisk1
  dataVolumeTemplates:
  - metadata:
      name: test-vm-pvc
    spec:
      storage:
        accessModes: ["ReadWriteOnce"]
        storageClassName: csi-cephrbd-sc-luks
        resources:
          requests:
            storage: 10Gi
      source:
        http:
          url: https://download.fedoraproject.org/pub/fedora/linux/releases/36/Cloud/x86_64/images/Fedora-Cloud-Base-36-1.5.x86_64.raw.xz

Actual results

Warning FailedMapVolume 64s (x16 over 18m) kubelet MapVolume.SetUpDevice failed for volume "pvc-275597f6-633b-45e8-b2d8-043732a0c7f8" : rpc error: code = Internal desc = rpc error: code = Internal desc = need resize check failed on devicePath /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-e03af360-21b6-4b87-ad93-e3c9ef1aa91b and stagingPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8/0001-0024-xxx-0000000000000002-e03af360-21b6-4b87-ad93-e3c9ef1aa91b, error: Could not parse fs info on given filesystem format: unknown data, probably partitions. Supported fs types are: xfs, ext3, ext4

Expected behavior

successfully mounted pvc to the virtual machine

Logs

If the issue is in PVC mounting please attach complete logs of below containers.

  • csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from
    plugin pod from the node where the mount is failing.

csi-rbdplugin csi-rbdplugin log

I0626 22:32:04.824772    1943 utils.go:195] ID: 112785 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0626 22:32:04.824879    1943 utils.go:206] ID: 112785 GRPC request: {}
I0626 22:32:04.825135    1943 utils.go:212] ID: 112785 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0626 22:32:04.832147    1943 utils.go:195] ID: 112786 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0626 22:32:04.832352    1943 utils.go:206] ID: 112786 GRPC request: {}
I0626 22:32:04.832575    1943 utils.go:212] ID: 112786 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0626 22:32:04.834601    1943 utils.go:195] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 GRPC call: /csi.v1.Node/NodeStageVolume
I0626 22:32:04.834867    1943 utils.go:206] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8","volume_capability":{"AccessType":{"Block":{}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"xxx","encrypted":"true","encryptionKMSID":"user-secrets-metadata-test","encryptionType":"block","imageFeatures":"layering","imageName":"csi-vol-fe89f246-0ced-4b4d-9399-a70c86411944","journalPool":"kubernetes","pool":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1687146648699-640-rbd.csi.ceph.com"},"volume_id":"0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944"}
I0626 22:32:04.840638    1943 omap.go:88] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 got omap values: (pool="kubernetes", namespace="", name="csi.volume.fe89f246-0ced-4b4d-9399-a70c86411944"): map[csi.imageid:6093b7ad1f6058 csi.imagename:csi-vol-fe89f246-0ced-4b4d-9399-a70c86411944 csi.volname:pvc-275597f6-633b-45e8-b2d8-043732a0c7f8 csi.volume.encryptKMS:user-secrets-metadata-test csi.volume.encryptionType:block csi.volume.owner:default]
I0626 22:32:04.927482    1943 rbd_util.go:352] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 checking for ImageFeatures: [layering]
I0626 22:32:05.016071    1943 cephcmds.go:105] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 command succeeded: rbd [device list --format=json --device-type krbd]
I0626 22:32:05.047851    1943 rbd_attach.go:420] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 rbd: map mon 10.10.10.58:6789,10.10.10.59:6789,10.10.10.60:6789
I0626 22:32:05.287561    1943 cephcmds.go:105] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 command succeeded: rbd [--id admin -m 10.10.10.58:6789,10.10.10.59:6789,10.10.10.60:6789 --keyfile=***stripped*** map kubernetes/csi-vol-fe89f246-0ced-4b4d-9399-a70c86411944 --device-type krbd --options noudev]
I0626 22:32:05.287642    1943 nodeserver.go:425] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 rbd image: kubernetes/csi-vol-fe89f246-0ced-4b4d-9399-a70c86411944 was successfully mapped at /dev/rbd3
I0626 22:32:05.318656    1943 encryption.go:87] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 image kubernetes/csi-vol-fe89f246-0ced-4b4d-9399-a70c86411944 encrypted state metadata reports "encrypted"
I0626 22:32:05.674190    1943 crypto.go:320] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 "/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944" is not an active LUKS device (an error (exit status 4) occurred while running cryptsetup args: [status luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944]): 
I0626 22:32:05.674247    1943 crypto.go:272] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 Opening device "/dev/rbd3" with LUKS on "luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944"
E0626 22:32:07.925764    1943 crypto.go:275] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 failed to open device "/dev/rbd3" (<nil>): DM-UUID for device luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 was truncated.
I0626 22:32:07.926004    1943 mount_linux.go:566] Attempting to determine if disk "/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944])
I0626 22:32:07.975152    1943 mount_linux.go:569] Output: "DEVNAME=/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944\nPTTYPE=gpt\n"
I0626 22:32:07.975235    1943 mount_linux.go:607] Disk /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 detected partition table type: gpt
I0626 22:32:07.975330    1943 mount_linux.go:219] Mounting cmd (mount) with arguments ( -o bind,_netdev /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8/0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944)
I0626 22:32:07.982980    1943 mount_linux.go:219] Mounting cmd (mount) with arguments ( -o bind,remount,_netdev /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8/0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944)
I0626 22:32:07.991220    1943 cephcmds.go:105] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 command succeeded: blockdev [--getsize64 /dev/rbd3]
I0626 22:32:07.992775    1943 cephcmds.go:105] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 command succeeded: blockdev [--getsize64 /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944]
I0626 22:32:07.992873    1943 crypto.go:283] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 Resizing LUKS device "/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944"
I0626 22:32:08.023339    1943 mount_linux.go:566] Attempting to determine if disk "/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944])
I0626 22:32:08.034829    1943 mount_linux.go:569] Output: "DEVNAME=/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944\nPTTYPE=gpt\n"
I0626 22:32:08.034918    1943 mount_linux.go:607] Disk /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 detected partition table type: gpt
I0626 22:32:08.034934    1943 resizefs_linux.go:124] ResizeFs.needResize - checking mounted volume /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944
E0626 22:32:08.034945    1943 resizefs_linux.go:136] Not able to parse given filesystem info. fsType: unknown data, probably partitions, will not resize
I0626 22:32:08.035001    1943 mount_linux.go:361] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8/0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944
I0626 22:32:08.063260    1943 crypto.go:294] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 Closing LUKS device "luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944"
I0626 22:32:08.261651    1943 cephcmds.go:105] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 command succeeded: rbd [unmap /dev/rbd3 --device-type krbd --options noudev]
E0626 22:32:08.262107    1943 utils.go:210] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = need resize check failed on devicePath /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 and staingPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8/0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944, error: Could not parse fs info on given filesystem format: unknown data, probably partitions. Supported fs types are: xfs, ext3, ext4
I0626 22:32:26.223974    1943 utils.go:195] ID: 112788 GRPC call: /csi.v1.Identity/Probe
I0626 22:32:26.224104    1943 utils.go:206] ID: 112788 GRPC request: {}
I0626 22:32:26.224150    1943 utils.go:212] ID: 112788 GRPC response: {}
I0626 22:32:55.938806    1943 utils.go:195] ID: 112789 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0626 22:32:55.938946    1943 utils.go:206] ID: 112789 GRPC request: {}
I0626 22:32:55.939215    1943 utils.go:212] ID: 112789 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}

csi-rbdplugin driver-registrar log

I0619 03:50:24.225070    1906 main.go:167] Version: v2.8.0
I0619 03:50:24.227222    1906 main.go:168] Running node-driver-registrar in mode=registration
I0619 03:50:24.230617    1906 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0619 03:50:26.291694    1906 main.go:199] Calling CSI driver to discover driver name
I0619 03:50:26.327572    1906 node_register.go:53] Starting Registration Server at: /registration/rbd.csi.ceph.com-reg.sock
I0619 03:50:26.328373    1906 node_register.go:62] Registration Server started at: /registration/rbd.csi.ceph.com-reg.sock
I0619 03:50:26.331201    1906 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0619 03:50:26.629608    1906 main.go:102] Received GetInfo call: &InfoRequest{}
I0619 03:50:26.630263    1906 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/rbd.csi.ceph.com/registration"
I0619 03:50:29.398778    1906 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
  • if required attach dmesg logs.

Note:- If its a rbd issue please provide only rbd related logs, if its a
cephFS issue please provide cephFS logs.

Additional context

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

@nixpanic
Copy link
Member

Could you add the following annotation to the StorageClass and try again?

cdi.kubevirt.io/clone-strategy=copy

@nixpanic nixpanic added question Further information is requested component/rbd Issues related to RBD labels Jun 29, 2023
@am6737
Copy link
Author

am6737 commented Jun 29, 2023

Could you add the following annotation to the StorageClass and try again?

cdi.kubevirt.io/clone-strategy=copy

Thank you for the information
I tried but did not solve the problem

@nixpanic
Copy link
Member

The volume should be of AccessType Block (as opposed to Filesystem) for VMs, that seems fine:

I0626 22:32:04.834601    1943 utils.go:195] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 GRPC call: /csi.v1.Node/NodeStageVolume
I0626 22:32:04.834867    1943 utils.go:206] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-275597f6-633b-45e8-b2d8-043732a0c7f8","volume_capability":{"AccessType":{"Block":{}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"xxx","encrypted":"true","encryptionKMSID":"user-secrets-metadata-test","encryptionType":"block","imageFeatures":"layering","imageName":"csi-vol-fe89f246-0ced-4b4d-9399-a70c86411944","journalPool":"kubernetes","pool":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1687146648699-640-rbd.csi.ceph.com"},"volume_id":"0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944"}

Mapping the rbd-image and decrypting seems to work too, but resizing fails:

I0626 22:32:07.992873    1943 crypto.go:283] ID: 112787 Req-ID: 0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 Resizing LUKS device "/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944"
I0626 22:32:08.023339    1943 mount_linux.go:566] Attempting to determine if disk "/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944])
I0626 22:32:08.034829    1943 mount_linux.go:569] Output: "DEVNAME=/dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944\nPTTYPE=gpt\n"
I0626 22:32:08.034918    1943 mount_linux.go:607] Disk /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944 detected partition table type: gpt
I0626 22:32:08.034934    1943 resizefs_linux.go:124] ResizeFs.needResize - checking mounted volume /dev/mapper/luks-rbd-0001-0024-xxx-0000000000000002-fe89f246-0ced-4b4d-9399-a70c86411944
E0626 22:32:08.034945    1943 resizefs_linux.go:136] Not able to parse given filesystem info. fsType: unknown data, probably partitions, will not resize

Because the volume has accesstype=Block, a resize of the filesystem (if any) on the block device should not get attempted. This could be a bug in the NodeStageVolume procedure.

@nixpanic
Copy link
Member

I think this could make it work:

diff --git a/internal/rbd/nodeserver.go b/internal/rbd/nodeserver.go
index 825f9bc61..7fcdb8166 100644
--- a/internal/rbd/nodeserver.go
+++ b/internal/rbd/nodeserver.go
@@ -512,6 +512,11 @@ func resizeNodeStagePath(ctx context.Context,
                if err != nil {
                        return status.Error(codes.Internal, err.Error())
                }
+
+               // if this is a AccessType=Block volume, do not attempt filesystem resize
+               if isBlock {
+                       return nil
+               }
        }
        // check stagingPath needs resize.
        ok, err = resizer.NeedResize(devicePath, stagingTargetPath)

Are you comfortable applying that change to the devel branch and building a test image? If not, I can create an image for you too (likely tomorrow or next week).

@am6737
Copy link
Author

am6737 commented Jun 29, 2023

I think this could make it work:

diff --git a/internal/rbd/nodeserver.go b/internal/rbd/nodeserver.go
index 825f9bc61..7fcdb8166 100644
--- a/internal/rbd/nodeserver.go
+++ b/internal/rbd/nodeserver.go
@@ -512,6 +512,11 @@ func resizeNodeStagePath(ctx context.Context,
                if err != nil {
                        return status.Error(codes.Internal, err.Error())
                }
+
+               // if this is a AccessType=Block volume, do not attempt filesystem resize
+               if isBlock {
+                       return nil
+               }
        }
        // check stagingPath needs resize.
        ok, err = resizer.NeedResize(devicePath, stagingTargetPath)

Are you comfortable applying that change to the devel branch and building a test image? If not, I can create an image for you too (likely tomorrow or next week).

Yes, I modified and tested my virtual machine according to your suggestion, and it worked.
I still have some doubts about the cause of this situation, Can you explain it to me.

@nixpanic
Copy link
Member

The issue that you have seen is related to the following, rather uncommon (but perfectly valid) scenario:

  • AccessType = Block volumes: the Pod gets a /dev/my-volume device-node, without filesystem
  • encryptionType = block: encryption with LUKS/cryptsetup, not filesystem encryption with fscrypt

This particular case was not handled in the resizeNodeStagePath() function. The resize is needed for the block device with LUKS encryption (with cryptsetup resize). Filesystem resize is not wanted in this case, because the application (VM) manages the data on the block-device. Ceph-CSI should not try to inspect the data on a AccessType=Block volume, that is expected to be completely handled by the application. Here, the VM has a partition-table on the disk, and that is not expected by resizefs, which fails because of it.

nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jun 30, 2023
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attepmted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.

In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.

When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.

Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.

Fixes: ceph#3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@nixpanic nixpanic self-assigned this Jun 30, 2023
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jun 30, 2023
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.

In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.

When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.

Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.

Fixes: ceph#3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@am6737
Copy link
Author

am6737 commented Jun 30, 2023

The issue that you have seen is related to the following, rather uncommon (but perfectly valid) scenario:

  • **AccessType = Block volumes: the Pod gets a /dev/my-volume device-node, without filesystem
  • **encryptionType = block: encryption with LUKS/cryptsetup, not filesystem encryption with fscrypt

This particular case was not handled in the resizeNodeStagePath() function. The resize is needed for the block device with LUKS encryption (with cryptsetup resize). Filesystem resize is not wanted in this case, because the application (VM) manages the data on the block-device. Ceph-CSI should not try to inspect the data on a AccessType=Block volume, that is expected to be completely handled by the application. Here, the VM has a partition-table on the disk, and that is not expected by resizefs, which fails because of it.

Thank you for everything you 've done for this

@am6737 am6737 closed this as completed Jun 30, 2023
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jul 3, 2023
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.

In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.

When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.

Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.

Fixes: ceph#3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
mergify bot pushed a commit that referenced this issue Jul 3, 2023
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.

In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.

When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.

Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.

Fixes: #3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
mergify bot pushed a commit that referenced this issue Jul 3, 2023
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.

In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.

When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.

Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.

Fixes: #3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
(cherry picked from commit f60a358)
nixpanic added a commit that referenced this issue Jul 4, 2023
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.

In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.

When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.

Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.

Fixes: #3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
(cherry picked from commit f60a358)
mergify bot pushed a commit that referenced this issue Jul 4, 2023
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.

In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.

When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.

Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.

Fixes: #3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
(cherry picked from commit f60a358)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants