CephFS mount syntax not updated for Quincy #3309

ygg-drop · 2022-08-16T14:28:56Z

Describe the bug

Apparently there was a significant change in the mount.ceph syntax between Ceph Pacific and Quincy. However Ceph-CSI code does not seem to be updated to support the new syntax.

I use Nomad 1.3.1 and I am trying to use Ceph-CSI to provide CephFS-based volumes to Nomad jobs. I tried the 3.6.2 version of Ceph-CSI (which is already based on Quincy) to mount a CephFS volume from a cluster running Ceph 17.2.0.

I use Nomad instead of Kubernetes, but I don't think this fact affects this bug.

Environment details

Image/version of Ceph CSI driver : 3.6.2
Helm chart version : N/A
Kernel version : 5.15.41-0-lts
Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
krbd or rbd-nbd) : kernel
Kubernetes cluster version : N/A
Ceph cluster version : 17.2.0

Steps to reproduce

Steps to reproduce the behavior:

Setup Nomad 1.3.x (can run in dev mode) and Ceph 17.2
In Ceph, create a CephFS called nomadfs and admin user
Deploy CSI Controller Plugin job using: nomad job run ceph-csi-plugin-controller.nomad
Deploy CSI Node Plugin job using: nomad job run ceph-csi-plugin-nodes.nomad
Deploy sample-fs-volume.hcl by running: nomad volume register sample-fs-volume.hcl
Deploy mysql-fs.nomad which tries to use the volume created in previous step using: nomad job run mysql-fs.nomad.
Observe error in ceph-mysql-fs job allocation logs.

ceph-csi-plugin-controller.nomad:

job "ceph-fs-csi-plugin-controller" {
  datacenters = ["dc1"]
 
  group "controller" {
    network {
      port "metrics" {}
    }

    task "ceph-controller" {
      driver = "docker"

      template {
        data = jsonencode([{
          clusterID = "67b72852-d1b8-45ad-b1f8-edb8c150ff9b"
          monitors  = ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
        }])
        destination = "local/config.json"
        change_mode = "restart"
      }

      config {
        image       = "quay.io/cephcsi/cephcsi:v3.6.2"
        volumes      = [
          "./local/config.json:/etc/ceph-csi-config/config.json"
        ]

        mounts = [
          {
            type          = "tmpfs"
            target        = "/tmp/csi/keys"
            readonly      = false
            tmpfs_options = {
              size = 1000000 # size in bytes
            }
          }
        ]

        args = [
          "--type=cephfs",
          "--controllerserver=true",
          "--drivername=cephfs.csi.ceph.com",
          "--endpoint=unix://csi/csi.sock",
          "--nodeid=${node.unique.name}",
          "--instanceid=${node.unique.name}-controller",
          "--pidlimit=-1",
          "--logtostderr=true",
          "--v=5",
          "-stderrthreshold=0",
          "--metricsport=$${NOMAD_PORT_metrics}"
        ]
      }

      resources {
        cpu    = 500
        memory = 256
      }

      service {
        name     = "ceph-fs-csi-controller"
        port     = "metrics"
        tags     = [ "prometheus" ]
      }

      csi_plugin {
        id        = "ceph-fs-csi"
        type      = "controller"
        mount_dir = "/csi"
      }
    }
  }
}

ceph-csi-plugin-nodes.nomad:

job "ceph-fs-csi-plugin-nodes" {
  datacenters = ["dc1"]
  type        = "system"

  group "nodes" {
    network {
      port "metrics" {}
    }

    task "ceph-node" {
      driver = "docker"

      template {
        data = jsonencode([{
          clusterID = "67b72852-d1b8-45ad-b1f8-edb8c150ff9b"
          monitors  = ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
        }])
        destination = "local/config.json"
        change_mode = "restart"
      }

      config {
        mount {
          type          = "tmpfs"
          target        = "/tmp/csi/keys"
          readonly      = false
          tmpfs_options = {
            size = 1000000 # size in bytes
          }
        }

        mount {
          type     = "bind"
          source   = "/lib/modules/${attr.kernel.version}"
          target   = "/lib/modules/${attr.kernel.version}"
          readonly = true
        }
       
        image       = "quay.io/cephcsi/cephcsi:v3.6.2"
        privileged  = true
        volumes     = [
          "./local/config.json:/etc/ceph-csi-config/config.json"
        ]
        args = [
          "--type=cephfs",
          "--drivername=cephfs.csi.ceph.com",
          "--nodeserver=true",
          "--endpoint=unix://csi/csi.sock",
          "--nodeid=${node.unique.name}",
          "--instanceid=${node.unique.name}-nodes",
          "--pidlimit=-1",
          "--logtostderr=true",
          "--v=5",
          "--metricsport=$${NOMAD_PORT_metrics}"
        ]
      }

      resources {
        cpu    = 500
        memory = 256
      }

      service {
        name = "ceph-fs-csi-nodes"
        port = "metrics"
        tags = [ "prometheus" ]
      }

      csi_plugin {
        id        = "ceph-fs-csi"
        type      = "node"
        mount_dir = "/csi"
      }
    }
  }
}

sample-fs-volume.hcl:

id           = "ceph-mysql-fs"
name         = "ceph-mysql-fs"
type         = "csi"
plugin_id    = "ceph-fs-csi"
external_id  = "nomadfs"

capability {
  access_mode     = "multi-node-multi-writer"
  attachment_mode = "file-system"
}

secrets {
  adminID  = "admin"
  adminKey = "AQDKpPtiDr30NRAAsqtMLh0WHUqZ0L4f2S/ouA=="
  userID  = "admin"
  userKey = "AQDKpPtiDr30NRAAsqtMLh0WHUqZ0L4f2S/ouA=="
}

parameters {
  clusterID = "67b72852-d1b8-45ad-b1f8-edb8c150ff9b"
  fsName    = "nomadfs"
}

context {
  monitors  = "192.168.1.10,192.168.1.11,192.168.1.12"
  provisionVolume = "false"
  rootPath = "/"
}

mysql-fs.nomad:

variable "mysql_root_password" {
  description = "Password for MySQL root user"
  type = string
  default = "password"
}

job "mysql-server-fs" {
  datacenters = ["dc1"]
  type        = "service"

  group "mysql-server-fs" {
    count = 1
    volume "ceph-mysql-fs" {
      type      = "csi"
      attachment_mode = "file-system"
      access_mode     = "multi-node-multi-writer"
      read_only = false
      source    = "ceph-mysql-fs"
    }
    network {
      port "db" {
        static = 3306
      }
    }
    restart {
      attempts = 10
      interval = "5m"
      delay    = "25s"
      mode     = "delay"
    }
    task "mysql-server" {
      driver = "docker"
      volume_mount {
        volume      = "ceph-mysql-fs"
        destination = "/srv"
        read_only   = false
      }
      env {
        MYSQL_ROOT_PASSWORD = "${var.mysql_root_password}"
      }
      config {
        image = "hashicorp/mysql-portworx-demo:latest"
        args  = ["--datadir", "/srv/mysql"]
        ports = ["db"]
      }
      resources {
        cpu    = 500
        memory = 1024
      }
      service {
        provider = "nomad"
        name = "mysql-server"
        port = "db"
      }
    }
  }
}

Actual results

Ceph-CSI node plugin failed to mount CephFS.

Expected behavior

Ceph-CSI node plugin should successfully mount CephFS using the new mount.ceph syntax.

Logs

nomad alloc status events:

Recent Events:
Time                       Type           Description
2022-08-16T12:32:18+02:00  Setup Failure  failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/ /local/csi/staging/ceph-mysql-fs/rw-file-system-multi-node-multi-writer -o name=admin,secretfile=/tmp/csi/keys/keyfile-2337295656,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2022-08-16T10:31:57.974+0000 7fdda8b9df40 -1 failed for service _ceph-mon._tcp
mount error: no mds server is up or the cluster is laggy

I suspect the unable to get monitor info from DNS SRV error happens because the mount.ceph helper in 17.x does not recognize anymore passing monitor IPs this way and falls back to using DNS SRV records.

Additional context

The text was updated successfully, but these errors were encountered:

Madhu-1 · 2022-08-17T05:04:20Z

AFAIK it should not an issue, we are using cephcsi with Qunicy, and we don't have this issue reported from anyone. @ygg-drop couple of questions

Is your ceph cluster healthy?
Have you retried running the mount command manually from the cephfsplugin container?
-t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/ Have you tried specifying the monitor port?

ygg-drop · 2022-08-17T08:22:05Z

Someone has to make the first report 😉 maybe this use-case is not very popular?

Is your ceph cluster healthy?

Yes.

Have you retried running the mount command manually from the cephfsplugin container?

Yes, I get the same error:

$ docker exec -ti ceac87b3d9fe bash
[root@ceac87b3d9fe /]# echo 'AQDKpPtiDr30NRAAsqtMLh0WHUqZ0L4f2S/ouA==' > /tmp/csi/keys/admin.key
[root@ceac87b3d9fe /]# mount -t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/ /mnt -o 'name=admin,secretfile=/tmp/csi/keys/admin.key,_netdev,fs=nomadfs'
unable to get monitor info from DNS SRV with service name: ceph-mon
2022-08-17T08:12:46.934+0000 7fceff0def40 -1 failed for service _ceph-mon._tcp
[root@ceac87b3d9fe /]#

-t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/ Have you tried specifying the monitor port?

Yes, same error:

[root@ceac87b3d9fe /]# mount -t ceph 192.168.1.10:6789,192.168.1.11:6789,192.168.1.12:6789:/ /mnt -o 'name=admin,secretfile=/tmp/csi/keys/admin.key,_netdev,fs=nomadfs'
unable to get monitor info from DNS SRV with service name: ceph-mon
2022-08-17T08:12:46.934+0000 7fceff0def40 -1 failed for service _ceph-mon._tcp
[root@ceac87b3d9fe /]#

When I try to mount using the Quincy mount.ceph syntax it works:

[root@ceac87b3d9fe /]# mount -t ceph admin@67b72852-d1b8-45ad-b1f8-edb8c150ff9b.nomadfs=/ /mnt -o 'secretfile=/tmp/csi/keys/admin.key,_netdev,mon_addr=192.168.1.10/192.168.1.11/192.168.1.12'
[root@ceac87b3d9fe /]# ls -la /mnt
total 4
drwxr-xr-x 3 root root    1 Aug 12 08:41 .
drwxr-xr-x 1 root root 4096 Aug 17 08:07 ..
drwxr-xr-x 3 root root    2 Aug 12 08:41 volumes
[root@ceac87b3d9fe /]#

EDIT:

I just tested with quay.io/cephcsi/cephcsi:v3.5.1 (which is based on Pacific) and the mount commands which failed previously do work there.

Informize · 2022-09-12T07:42:32Z

I have exactly the same problem. With basic k8s installation (1.25) and ceph installation (17.2.3).
Where cephcsi:v3.5.1 works fine and cephcsi:v3.7.1 fails.

Madhu-1 · 2022-09-14T05:49:12Z

@mchangir can you please help here, not sure why mounting fails here cephcsi uses ceph 17.2 as the base image but still looks like the mount is failing on the 17.2.3 cluster.

Note:- we have not seen this issue in Rook ceph clusters.

Madhu-1 · 2022-09-14T14:12:30Z

@Informize can you please provide the dmesg on the node

Madhu-1 · 2022-09-14T14:13:46Z

@Informize can you also run the mount command in verbose mode?

humblec · 2022-09-14T14:37:28Z

@ygg-drop I am wondering the compatibility between userspace package (ex: ceph-common) for the binaries to kernel ((5.15.41-0-lts) cause the issue here ?
With that assumption , is it the same versions you have in your cluster nodes ? and mount fails on all the nodes ? or speciifc to some nodes ?

vshankar · 2022-09-15T04:34:57Z

The mount syntax changes have been kept backward compatible. The old syntax should work with newer kernels.

vshankar · 2022-09-15T04:35:35Z

@ygg-drop dmesg and running mount helper with verbose flag would help debug what's going on.

Informize · 2022-09-15T09:26:37Z

@Madhu-1

OS on ceph nodes and k8s control/worker nodes are all:

Linux control-11 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Version of cephcsi that works:

kubectl exec -it csi-cephfsplugin-4pkct -c csi-cephfsplugin bash
# /usr/local/bin/cephcsi --version
Cephcsi Version: v3.5-canary
Git Commit: c374edcbaa2a5dd364a9d526728e1629cd666a82
Go Version: go1.17.5
Compiler: gc
Platform: linux/amd64
Kernel: 5.4.0-125-generic

and mount command:

[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs'
parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev
mount.ceph: options "name=admin,mds_namespace=cephfs" will pass to kernel.
[root@worker-12 /]# df -h /mnt
Filesystem                                   Size  Used Avail Use% Mounted on
192.168.21.11,192.168.21.12,192.168.21.13:/  373G     0  373G   0% /mnt

With ceph-csi 3.7

[root@worker-12 /]# /usr/local/bin/cephcsi --version
Cephcsi Version: v3.7-canary
Git Commit: 468c73d2b61a955503bd82e083b209f73e62a12e
Go Version: go1.18.5
Compiler: gc
Platform: linux/amd64
Kernel: 5.4.0-125-generic

And mount command in verbose mode:

[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs'
parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev
mount.ceph: options "name=admin,mds_namespace=cephfs".
invalid new device string format
unable to get monitor info from DNS SRV with service name: ceph-mon
keyring.get_secret failed
2022-09-15T09:00:11.472+0000 7f0946866f40 -1 failed for service _ceph-mon._tcp
mount.ceph: resolved to: "192.168.21.11,192.168.21.12,192.168.21.13"
mount.ceph: trying mount with old device syntax: 192.168.21.11,192.168.21.12,192.168.21.13:/
mount.ceph: options "name=admin,mds_namespace=cephfs,key=admin,fsid=00000000-0000-0000-0000-000000000000" will pass to kernel

vshankar · 2022-09-15T09:54:01Z

And mount command in verbose mode:

[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs'
parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev
mount.ceph: options "name=admin,mds_namespace=cephfs".
invalid new device string format
unable to get monitor info from DNS SRV with service name: ceph-mon
keyring.get_secret failed
2022-09-15T09:00:11.472+0000 7f0946866f40 -1 failed for service _ceph-mon._tcp
mount.ceph: resolved to: "192.168.21.11,192.168.21.12,192.168.21.13"
mount.ceph: trying mount with old device syntax: 192.168.21.11,192.168.21.12,192.168.21.13:/
mount.ceph: options "name=admin,mds_namespace=cephfs,key=admin,fsid=00000000-0000-0000-0000-000000000000" will pass to kernel

Do you see no output after this or does the command hang? And does the mount go through (grep ceph /proc/mounts)?

Anything in dmesg? The 0's in fsid might be the issue here.

Informize · 2022-09-15T11:34:17Z

@vshankar
It hangs for a bit and then returns to prompt
dmesg relevant messages:

[691243.945358] libceph: mon1 (1)192.168.21.12:6789 session established
[691243.945750] libceph: mon1 (1)192.168.21.12:6789 socket closed (con state OPEN)
[691243.945764] libceph: mon1 (1)192.168.21.12:6789 session lost, hunting for new mon
[691243.950361] libceph: mon2 (1)192.168.21.13:6789 session established
[691243.951107] libceph: client64579 fsid b5426b62-1ecd-11ed-90ab-8f774f76a3a8

and cat /proc/mounts | grep ceph:

192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt ceph rw,relatime,name=admin,secret=<hidden>,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=cephfs 0 0

pvc/pv's:

root@control-11:~# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS    REASON   AGE
pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed   1Gi        RWX            Delete           Bound    default/csi-cephfs-pvc   csi-cephfs-sc            25m
root@control-11:~# kubectl get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
csi-cephfs-pvc   Bound    pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed   1Gi        RWX            csi-cephfs-sc   27m

pod:

root@control-11:~# kubectl get pods csi-cephfs-demo-pod
NAME                  READY   STATUS              RESTARTS   AGE
csi-cephfs-demo-pod   0/1     ContainerCreating   0          21m
root@control-11:~# kubectl describe pod csi-cephfs-demo-pod
Name:             csi-cephfs-demo-pod
Namespace:        default
Priority:         0
Service Account:  default
Node:             worker-11/192.168.31.21
Start Time:       Thu, 15 Sep 2022 13:29:37 +0200
Labels:           <none>
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Containers:
  web-server:
    Container ID:
    Image:          docker.io/library/nginx:latest
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/www from mypvc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v8qv7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  mypvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-cephfs-pvc
    ReadOnly:   false
  kube-api-access-v8qv7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age                 From                     Message
  ----     ------              ----                ----                     -------
  Normal   Scheduled           22m                 default-scheduler        Successfully assigned default/csi-cephfs-demo-pod to worker-11
  Warning  FailedMount         10m (x2 over 17m)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[kube-api-access-v8qv7 mypvc]: timed out waiting for the condition
  Warning  FailedAttachVolume  113s (x9 over 20m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed" : timed out waiting for external-attacher of cephfs.csi.ceph.com CSI driver to attach volume 0001-0024-b5426b62-1ecd-11ed-90ab-8f774f76a3a8-0000000000000001-3def7d0a-34e9-11ed-9616-9274cb35d772
  Warning  FailedMount         107s (x7 over 19m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-v8qv7]: timed out waiting for the condition

vshankar · 2022-09-15T12:25:02Z

@vshankar It hangs for a bit and then returns to prompt dmesg relevant messages:

[691243.945358] libceph: mon1 (1)192.168.21.12:6789 session established
[691243.945750] libceph: mon1 (1)192.168.21.12:6789 socket closed (con state OPEN)
[691243.945764] libceph: mon1 (1)192.168.21.12:6789 session lost, hunting for new mon
[691243.950361] libceph: mon2 (1)192.168.21.13:6789 session established
[691243.951107] libceph: client64579 fsid b5426b62-1ecd-11ed-90ab-8f774f76a3a8

and cat /proc/mounts | grep ceph:

192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt ceph rw,relatime,name=admin,secret=<hidden>,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=cephfs 0 0

That the cephfs mount then, isn't it?

stat /mnt?

Informize · 2022-09-15T12:36:50Z

@vshankar yes indeed, with mount command i can manually mount it on worker node

vshankar · 2022-09-15T12:45:34Z

@vshankar yes indeed, with mount command i can manually mount it on worker node

OK. Is the same command run by ceph-csi plugin? Can you enable mount helper debugging when its being run by ceph-csi?

mchangir · 2022-09-15T12:50:13Z

I wonder whats the cat /proc/mounts | grep ceph output where the mount succeeds at the command-line

Informize · 2022-09-15T12:59:44Z

mount helper debugging when its being run by ceph-csi?

sorry, can you point out where to look how to enable debug by ceph-csi ?

vshankar · 2022-09-15T16:12:32Z

mount helper debugging when its being run by ceph-csi?

sorry, can you point out where to look how to enable debug by ceph-csi ?

I have no idea. @Madhu-1 @humblec might know.

vshankar · 2022-09-15T16:13:32Z

I wonder whats the cat /proc/mounts | grep ceph output where the mount succeeds at the command-line

/proc/mounts has the record to the cephfs mount as mentioned in this comment #3309 (comment)

mchangir · 2022-09-17T16:30:37Z

I wonder whats the cat /proc/mounts | grep ceph output where the mount succeeds at the command-line

/proc/mounts has the record to the cephfs mount as mentioned in this comment #3309 (comment)

I presume that this comment lists the contents of /proc/mounts when the commands doesn't succeed as desired.
What I meant to ask was the output of /proc/mounts when the command succeeds immediately to verify the fsid listed in the output against the failed/delayed version.

Madhu-1 · 2022-09-19T05:16:22Z

mount helper debugging when its being run by ceph-csi?

sorry, can you point out where to look how to enable debug by ceph-csi ?

I have no idea. @Madhu-1 @humblec might know.

No option for doing it, you can exec into the csi-cephfsplugin container and run the mount command manually with debug flags

Informize · 2022-09-19T07:20:14Z

It will manually mount but not from a pod. Debug flags are set as you can see in this comment: #3309 (comment)

vshankar · 2022-09-20T15:01:55Z

@Informize Without debug logs from a failed mount instance, its hard to tell what's going on.

Informize · 2022-09-22T07:00:19Z

@vshankar

@Madhu-1

OS on ceph nodes and k8s control/worker nodes are all:

Linux control-11 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Version of cephcsi that works:

kubectl exec -it csi-cephfsplugin-4pkct -c csi-cephfsplugin bash
# /usr/local/bin/cephcsi --version
Cephcsi Version: v3.5-canary
Git Commit: c374edcbaa2a5dd364a9d526728e1629cd666a82
Go Version: go1.17.5
Compiler: gc
Platform: linux/amd64
Kernel: 5.4.0-125-generic

and mount command:

[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs'
parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev
mount.ceph: options "name=admin,mds_namespace=cephfs" will pass to kernel.
[root@worker-12 /]# df -h /mnt
Filesystem                                   Size  Used Avail Use% Mounted on
192.168.21.11,192.168.21.12,192.168.21.13:/  373G     0  373G   0% /mnt

With ceph-csi 3.7

[root@worker-12 /]# /usr/local/bin/cephcsi --version
Cephcsi Version: v3.7-canary
Git Commit: 468c73d2b61a955503bd82e083b209f73e62a12e
Go Version: go1.18.5
Compiler: gc
Platform: linux/amd64
Kernel: 5.4.0-125-generic

And mount command in verbose mode:

[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs'
parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev
mount.ceph: options "name=admin,mds_namespace=cephfs".
invalid new device string format
unable to get monitor info from DNS SRV with service name: ceph-mon
keyring.get_secret failed
2022-09-15T09:00:11.472+0000 7f0946866f40 -1 failed for service _ceph-mon._tcp
mount.ceph: resolved to: "192.168.21.11,192.168.21.12,192.168.21.13"
mount.ceph: trying mount with old device syntax: 192.168.21.11,192.168.21.12,192.168.21.13:/
mount.ceph: options "name=admin,mds_namespace=cephfs,key=admin,fsid=00000000-0000-0000-0000-000000000000" will pass to kernel

This is all the debug information that i have. Is there a way how to get extra debug information ? I see there is also another github issue related to this one: #3390

vshankar · 2022-09-22T08:51:24Z

This is all the debug information that i have. Is there a way how to get extra debug information ? I see there is also another github issue related to this one: #3390

Do you have dmesg logs when the mount fails from the pod?

Madhu-1 · 2022-09-22T09:04:18Z

@Informize i dont see any mount failure in your case. Do you think the mount is failing? Can you please provide cephfsplugin container logs?

Informize · 2022-10-10T08:54:37Z

Events:
Type Reason Age From Message

Normal Scheduled 22m default-scheduler Successfully assigned default/csi-cephfs-demo-pod to worker-11
Warning FailedMount 10m (x2 over 17m) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[kube-api-access-v8qv7 mypvc]: timed out waiting for the condition
Warning FailedAttachVolume 113s (x9 over 20m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed" : timed out waiting for external-attacher of cephfs.csi.ceph.com CSI driver to attach volume 0001-0024-b5426b62-1ecd-11ed-90ab-8f774f76a3a8-0000000000000001-3def7d0a-34e9-11ed-9616-9274cb35d772
Warning FailedMount 107s (x7 over 19m) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-v8qv7]: timed out waiting for the condition

here is the mount failing

Zempashi · 2022-10-15T07:19:21Z

csi-cephfsplugin-q4wrr csi-cephfsplugin E1015 07:14:58.049484       1 nodeserver.go:273] ID: 98 Req-ID: 0001-0009-rook-ceph-0000000000000001-e305d04a-4c56-11ed-b1c9-bad2e6b34b46 failed to mount volume 0001-0009-rook-ceph-0000000000000001-e305d04a-4c56-11ed-b1c9-bad2e6b34b46: an error (exit status 32) occurred while running mount args: [-t ceph 172.16.31.1:6789,172.16.31.2:6789,172.16.31.3:6789:/volumes/csi/csi-vol-e305d04a-4c56-11ed-b1c9-bad2e6b34b46/1de9ce81-0c1b-40f1-9a06-7bd9147b17fd /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/4d6a495ec8fb8ebee362db7492c7cfe27425c92c4ab6a98daf91f3b69f37227c/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1117944320,mds_namespace=ceph-filesystem,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
csi-cephfsplugin-q4wrr csi-cephfsplugin 2022-10-15T07:11:53.545+0000 7fd361a91f40 -1 failed for service _ceph-mon._tcp
csi-cephfsplugin-q4wrr csi-cephfsplugin mount error 110 = Connection timed out
csi-cephfsplugin-q4wrr csi-cephfsplugin  Check dmesg logs if required.

Happen on rook 1.10.2 , external ceph installed via cephadm in version 16.2.10, kubernetes nodes are on archlinux, kernel 5.19.5-arch1-1

No really relevant dmesg error:

# dmesg | grep ceph 
[123793.160045] libceph: mon1 (1)172.16.31.2:6789 session established
[123793.166842] libceph: client88753 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124041.551742] libceph: mon1 (1)172.16.31.2:6789 session established
[124041.553692] libceph: client88993 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124202.424541] libceph: mon1 (1)172.16.31.2:6789 session established
[124202.427140] libceph: client89176 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124450.617950] libceph: mon0 (1)172.16.31.1:6789 session established
[124450.619957] libceph: client78993 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124692.657906] libceph: mon2 (1)172.16.31.3:6789 session established
[124692.661706] libceph: client89738 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124934.731612] libceph: mon2 (1)172.16.31.3:6789 session established
[124934.733545] libceph: client89969 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[125176.752300] libceph: mon0 (1)172.16.31.1:6789 session established
[125176.757338] libceph: client79680 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[125418.846206] libceph: mon2 (1)172.16.31.3:6789 session established
[125418.848306] libceph: client90452 fsid f7238ede-4bab-11ed-b520-0008a20c73ec

Same error trying to mount manually (in the container, archlinux remove ceph-library from repository few days ago), and volume don't get mounted:

# crictl exec -ti 97e8c71b2ca56 bash
# mount -t ceph 172.16.31.1:6789,172.16.31.2:6789,172.16.31.3:6789:/volumes/csi/csi-vol-e305d04a-4c56-11ed-b1c9-bad2e6b34b46/1de9ce81-0c1b-40f1-9a06-7bd9147b17fd /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/4d6a495ec8fb8ebee362db7492c7cfe27425c92c4ab6a98daf91f3b69f37227c/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-3681000612,mds_namespace=ceph-filesystem,_netdev
unable to get monitor info from DNS SRV with service name: ceph-mon
2022-10-15T07:29:47.176+0000 7f6117edef40 -1 failed for service _ceph-mon._tcp
mount error 110 = Connection timed out
# mount | grep ceph
/dev/sda3 on /etc/ceph-csi-config type ext4 (ro,relatime,data=ordered)
tmpfs on /var/lib/kubelet/pods/1b34caf0-80de-4557-9fff-02ef96c04947/volumes/kubernetes.io~projected/ceph-csi-configs type tmpfs (rw,relatime,size=2097152k,inode64)
tmpfs on /var/lib/kubelet/pods/35e5886b-4939-4312-91ce-a628dd5979bf/volumes/kubernetes.io~projected/ceph-csi-configs type tmpfs (rw,relatime,size=1310720k,inode64)
tmpfs on /var/lib/kubelet/pods/75c839ea-d1b9-4df6-8cc8-fb7f149954f3/volumes/kubernetes.io~secret/rook-ceph-mds-ceph-filesystem-a-keyring type tmpfs (rw,relatime,size=16252404k,inode64)
tmpfs on /var/lib/kubelet/pods/109e62d6-41e2-435b-b8c8-a4e8c943bc80/volumes/kubernetes.io~secret/rook-ceph-crash-collector-keyring type tmpfs (rw,relatime,size=61440k,inode64)
tmpfs on /var/lib/kubelet/pods/166d5b99-4f47-4d68-8bf3-be181b04d4bd/volumes/kubernetes.io~secret/rook-ceph-rgw-ceph-objectstore-a-keyring type tmpfs (rw,relatime,size=16252404k,inode64)

Zempashi · 2022-10-15T19:38:43Z

Confirm that running rook 1.9.12 (rook 1.10.X supports cephcsi >=3.6.0, that does NOT solve the issue) and forcing cephcsi:3.5.1 solve the issue.

Madhu-1 · 2022-10-17T07:30:39Z

Can someone provide me the setup details/reproducer i would like to reproduce it locally and see what is wrong.

github-actions · 2022-11-16T21:01:36Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions · 2022-11-23T21:01:39Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

remram44 · 2022-12-08T17:48:20Z

This seems to be an active issue, where the only workaround is downgrading cephcsi. There is also an open PR for it. Should it be reopened?

As a data point, it affects me too:

an error (exit status 234) occurred while running mount args: [-t ceph v2:192.168.61.11:3300/0,v1:192.168.61.11:6789/0,v2:192.168.61.12:3300/0,v1:192.168.61.12:6789/0:/volumes/csi/csi-vol-192d6009-bfe4-4410-8c7a-c65ca73d5d1e/2917ff3d-8acd-48b5-aab8-0f1f876b4266 /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/fbbcc818e6f7c6ec9395d4bb29b400ede4fc12b6d18c93c138f2c96772be1d83/globalmount -o name=kubernetes07,secretfile=/tmp/csi/keys/keyfile-634581935,mds_namespace=kubernetes07,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon

Changing the CSI ConfigMap fixes it: from:

"monitors": [
  "v2:192.168.61.11:3300/0,v1:192.168.61.11:6789/0",
  "v2:192.168.61.12:3300/0,v1:192.168.61.12:6789/0"
]

to:

"monitors": [
  "192.168.61.11",
  "192.168.61.12"
]

vshankar · 2022-12-09T02:29:05Z

This seems to be an active issue, where the only workaround is downgrading cephcsi. There is also an open PR for it. Should it be reopened?

There is PR (ceph/ceph#48873) in ceph to fix a mount issue. Are you referring to that or some other fix?

Informize mentioned this issue Sep 15, 2022

ceph-csi-cephfs 3.7.0 no longer attaches the volume to the pod #3390

Closed

Informize mentioned this issue Nov 8, 2022

plugin script csidriver added #3504

Closed

github-actions bot added the wontfix This will not be worked on label Nov 16, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 23, 2022

nixpanic mentioned this issue Dec 12, 2022

deploy: csidriver added to helper scripts #3573

Merged

nixpanic added the dependency/nomad label Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CephFS mount syntax not updated for Quincy #3309

CephFS mount syntax not updated for Quincy #3309

ygg-drop commented Aug 16, 2022

Madhu-1 commented Aug 17, 2022

ygg-drop commented Aug 17, 2022 •

edited

Loading

Informize commented Sep 12, 2022

Madhu-1 commented Sep 14, 2022

Madhu-1 commented Sep 14, 2022

Madhu-1 commented Sep 14, 2022

humblec commented Sep 14, 2022

vshankar commented Sep 15, 2022

vshankar commented Sep 15, 2022

Informize commented Sep 15, 2022

vshankar commented Sep 15, 2022

Informize commented Sep 15, 2022 •

edited

Loading

vshankar commented Sep 15, 2022 •

edited

Loading

Informize commented Sep 15, 2022

vshankar commented Sep 15, 2022

mchangir commented Sep 15, 2022

Informize commented Sep 15, 2022

vshankar commented Sep 15, 2022

vshankar commented Sep 15, 2022

mchangir commented Sep 17, 2022

Madhu-1 commented Sep 19, 2022

Informize commented Sep 19, 2022

vshankar commented Sep 20, 2022

Informize commented Sep 22, 2022

vshankar commented Sep 22, 2022

Madhu-1 commented Sep 22, 2022

Informize commented Oct 10, 2022

Zempashi commented Oct 15, 2022 •

edited

Loading

Zempashi commented Oct 15, 2022

Madhu-1 commented Oct 17, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 23, 2022

remram44 commented Dec 8, 2022 •

edited

Loading

vshankar commented Dec 9, 2022

CephFS mount syntax not updated for Quincy #3309

CephFS mount syntax not updated for Quincy #3309

Comments

ygg-drop commented Aug 16, 2022

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Logs

Additional context

Madhu-1 commented Aug 17, 2022

ygg-drop commented Aug 17, 2022 • edited Loading

Informize commented Sep 12, 2022

Madhu-1 commented Sep 14, 2022

Madhu-1 commented Sep 14, 2022

Madhu-1 commented Sep 14, 2022

humblec commented Sep 14, 2022

vshankar commented Sep 15, 2022

vshankar commented Sep 15, 2022

Informize commented Sep 15, 2022

vshankar commented Sep 15, 2022

Informize commented Sep 15, 2022 • edited Loading

vshankar commented Sep 15, 2022 • edited Loading

Informize commented Sep 15, 2022

vshankar commented Sep 15, 2022

mchangir commented Sep 15, 2022

Informize commented Sep 15, 2022

vshankar commented Sep 15, 2022

vshankar commented Sep 15, 2022

mchangir commented Sep 17, 2022

Madhu-1 commented Sep 19, 2022

Informize commented Sep 19, 2022

vshankar commented Sep 20, 2022

Informize commented Sep 22, 2022

vshankar commented Sep 22, 2022

Madhu-1 commented Sep 22, 2022

Informize commented Oct 10, 2022

Zempashi commented Oct 15, 2022 • edited Loading

Zempashi commented Oct 15, 2022

Madhu-1 commented Oct 17, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 23, 2022

remram44 commented Dec 8, 2022 • edited Loading

vshankar commented Dec 9, 2022

ygg-drop commented Aug 17, 2022 •

edited

Loading

Informize commented Sep 15, 2022 •

edited

Loading

vshankar commented Sep 15, 2022 •

edited

Loading

Zempashi commented Oct 15, 2022 •

edited

Loading

remram44 commented Dec 8, 2022 •

edited

Loading