Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod use cephfs volume when reboot minion appear two mount point #778

Closed
jianglingxia opened this issue Jan 8, 2020 · 6 comments
Closed

Comments

@jianglingxia
Copy link

jianglingxia commented Jan 8, 2020

Describe the bug

the application pod first in minion 172.20.0.3 used cephfs and the mount point in 172.20.0.3

[root@paas-controller-172-20-0-2:/paasdata/op-data/op-storage-ceph_csi_driver/tasks]$ kubectl get po -n opcs -w -o wide
NAME                    READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
nginx1839-1-g6s8c       1/1     Running   0          45s     100.100.0.7   172.20.0.3   <none>           <none>

[root@paas-controller-172-20-0-3:/home/ubuntu]$ df -h |grep ceph
tmpfs                      63G   12K   63G   1% /paasdata/docker/pods/dcf3bd65-31b5-11ea-85fc-744aa4028242/volumes/kubernetes.io~secret/ceph1-cephfs-csi-nodeplugin-token-t78vc
ceph-fuse                 1.0G     0  1.0G   0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-f41f0c1d-30f5-11ea-b694-744aa402809e/globalmount

then i reboot the 172.20.0.3 minion ,the pod auto scheduler to

opcs          nginx1839-1-8bhxj                      1/1     Running   0          4m54s   100.100.0.7    172.20.0.2

and the mount point in 172.20.0.2 is

[root@paas-controller-172-20-0-2:/home/ubuntu]$ df -h |grep ceph
tmpfs                      63G   12K   63G   1% /paasdata/docker/pods/dcf34ca0-31b5-11ea-85fc-744aa4028242/volumes/kubernetes.io~secret/ceph1-cephfs-csi-nodeplugin-token-t78vc
ceph-fuse                 1.0G     0  1.0G   0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-f41f0c1d-30f5-11ea-b694-744aa402809e/globalmount

but the 172.20.0.3 the origin mount point existed,may be need umount ?
A clear and concise description of what the bug is.

Environment details

  • Image/version of Ceph CSI driver v1.2.0
  • Kubernetes cluster version v1.13.6
@jianglingxia
Copy link
Author

@Madhu-1
thanks

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 25, 2020

@jianglingxia please add steps to reproduce this issue

@jianglingxia
Copy link
Author

1)reboot the minion
[root@paas-controller-172-20-0-2:/home/ubuntu]$ reboot
systemdctrl halt down for system reboot, 0x01
Connection to 172.20.0.2 closed by remote host.
Connection to 172.20.0.2 closed.

2)pod in minion 172.20.0.2 then rescheadul to 172.20.0.4
kubectl get nodes

172.20.0.2 NotReady 71d v1.13.6
172.20.0.3 Ready 71d v1.13.6
172.20.0.4 Ready 71d v1.13.6

nginx4-1-trrdw 0/1 Terminating 0 18h 172.20.0.2
nginx1-1-7frmn 0/1 Terminating 0 18h 172.20.0.2
nginx1-1-7frmn 0/1 Terminating 0 18h 172.20.0.2
nginx3-1-jr6v4 0/1 Terminating 0 18h 172.20.0.2
nginx3-1-jr6v4 0/1 Terminating 0 18h 172.20.0.2
nginx5-1-xcjhp 0/1 Terminating 0 5h56m 172.20.0.2
nginx5-1-xcjhp 0/1 Terminating 0 5h56m 172.20.0.2
nginx4-1-kmsjj 1/1 Running 0 4m20s 100.100.1.0 172.20.0.3
nginx3-1-nxnmz 1/1 Running 0 4m21s 100.100.0.255 172.20.0.4
nginx5-1-9xd2n 1/1 Running 0 4m21s 100.100.1.1 172.20.0.3
nginx1-1-qrjcd 1/1 Running 0 4m22s 100.100.0.253 172.20.0.4

3the pod in minion mountpath is :
[root@paas-controller-172-20-0-4:/home/ubuntu]$ df -h |grep ceph-fuse
ceph-fuse 500G 0 500G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35aa2c2b-549c-11ea-8500-744aa4028242/globalmount
ceph-fuse 500G 0 500G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount
ceph-fuse 500G 0 500G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35af6501-549c-11ea-8500-744aa4028242/globalmount

the minion 172.20.0.2 no more one pod but because the minion reboot so the csiplugin restart it then the driver mountcache fuction maybe remounted one volume,then the minion has one path
:so i think the mountcache function #282 maybe have some bug? #836

Log file created at: 2020/02/22 15:32:48
Running on machine: paas-controller-172-20-0-2
Binary: Built with gc go1.11.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0222 15:32:48.273042 1 cephcsi.go:103] Driver version: v1 and Git version: a5eac7346fd78c4d625d46d92bd8f378719bb44d
I0222 15:32:48.281687 1 cachepersister.go:45] cache-perister: using kubernetes configmap as metadata cache persister
I0222 15:32:48.294017 1 cephcsi.go:158] Starting driver type: cephfs with name: cephfs.csi.ceph.com
I0222 15:32:48.357121 1 util.go:48] cephfs: EXEC uname [-r]
I0222 15:32:48.357887 1 volumemounter.go:77] kernel version < 4.17 might not support quota feature, hence not loading kernel client
I0222 15:32:49.090815 1 volumemounter.go:82] loaded mounter: fuse
I0222 15:32:49.091099 1 mountcache.go:59] mount-cache: name: cephfs.csi.ceph.com, version: v1, mountCacheDir: /mount-cache-dir
I0222 15:32:49.116049 1 util.go:48] cephfs: EXEC ceph [-m 100.100.100.148,100.100.100.154,100.100.100.156 --id admin --keyfile=stripped -c /etc/ceph/ceph.conf fs dump --format=json]
I0222 15:32:55.138053 1 util.go:48] cephfs: EXEC ceph [-m 100.100.100.148,100.100.100.154,100.100.100.156 --id admin --keyfile=stripped -c /etc/ceph/ceph.conf fs ls --format=json]
I0222 15:32:56.613129 1 mount_linux.go:160] Detected OS without systemd
I0222 15:32:56.613213 1 util.go:48] cephfs: EXEC ceph-fuse [/paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount -m 100.100.100.148,100.100.100.154,100.100.100.156 -c /etc/ceph/ceph.conf -n client.admin --keyfile=stripped -r /csi-volumes/csi-vol-36021ecc-549c-11ea-abf0-fa163e3628e5 -o nonempty --client_mds_namespace=cephfs1219]
I0222 15:33:01.249513 1 util.go:48] cephfs: EXEC mount [-o bind /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount /paasdata/docker/pods/176b25eb-54ac-11ea-9d2e-744aa4028226/volumes/kubernetes.iocsi/pvc-35a5403a-549c-11ea-8500-744aa4028242/mount]
I0222 15:33:01.250240 1 mountcache.go:165] mount-cache: successfully bind-mounted volume 0001-0024-96c7e5a6-04ac-11ea-8b4e-84139f31690d-0000000000000004-36021ecc-549c-11ea-abf0-fa163e3628e5: /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount /paasdata/docker/pods/176b25eb-54ac-11ea-9d2e-744aa4028226/volumes/kubernetes.io
csi/pvc-35a5403a-549c-11ea-8500-744aa4028242/mount false
I0222 15:33:01.250289 1 mountcache.go:84] mount-cache: successfully remounted volume 0001-0024-96c7e5a6-04ac-11ea-8b4e-84139f31690d-0000000000000004-36021ecc-549c-11ea-abf0-fa163e3628e5
I0222 15:33:01.250304 1 mountcache.go:99] mount-cache: successfully remounted 1 volumes

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 25, 2020

Yes, #282 is buggy and removed in the master branch.

@huaizong
Copy link

1)reboot the minion
[root@paas-controller-172-20-0-2:/home/ubuntu]$ reboot
systemdctrl halt down for system reboot, 0x01
Connection to 172.20.0.2 closed by remote host.
Connection to 172.20.0.2 closed.

2)pod in minion 172.20.0.2 then rescheadul to 172.20.0.4
kubectl get nodes

172.20.0.2 NotReady 71d v1.13.6
172.20.0.3 Ready 71d v1.13.6
172.20.0.4 Ready 71d v1.13.6

nginx4-1-trrdw 0/1 Terminating 0 18h 172.20.0.2
nginx1-1-7frmn 0/1 Terminating 0 18h 172.20.0.2
nginx1-1-7frmn 0/1 Terminating 0 18h 172.20.0.2
nginx3-1-jr6v4 0/1 Terminating 0 18h 172.20.0.2
nginx3-1-jr6v4 0/1 Terminating 0 18h 172.20.0.2
nginx5-1-xcjhp 0/1 Terminating 0 5h56m 172.20.0.2
nginx5-1-xcjhp 0/1 Terminating 0 5h56m 172.20.0.2
nginx4-1-kmsjj 1/1 Running 0 4m20s 100.100.1.0 172.20.0.3
nginx3-1-nxnmz 1/1 Running 0 4m21s 100.100.0.255 172.20.0.4
nginx5-1-9xd2n 1/1 Running 0 4m21s 100.100.1.1 172.20.0.3
nginx1-1-qrjcd 1/1 Running 0 4m22s 100.100.0.253 172.20.0.4

3the pod in minion mountpath is :
[root@paas-controller-172-20-0-4:/home/ubuntu]$ df -h |grep ceph-fuse
ceph-fuse 500G 0 500G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35aa2c2b-549c-11ea-8500-744aa4028242/globalmount
ceph-fuse 500G 0 500G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount
ceph-fuse 500G 0 500G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35af6501-549c-11ea-8500-744aa4028242/globalmount

the minion 172.20.0.2 no more one pod but because the minion reboot so the csiplugin restart it then the driver mountcache fuction maybe remounted one volume,then the minion has one path
:so i think the mountcache function #282 maybe have some bug? #836

Log file created at: 2020/02/22 15:32:48
Running on machine: paas-controller-172-20-0-2
Binary: Built with gc go1.11.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0222 15:32:48.273042 1 cephcsi.go:103] Driver version: v1 and Git version: a5eac7346fd78c4d625d46d92bd8f378719bb44d
I0222 15:32:48.281687 1 cachepersister.go:45] cache-perister: using kubernetes configmap as metadata cache persister
I0222 15:32:48.294017 1 cephcsi.go:158] Starting driver type: cephfs with name: cephfs.csi.ceph.com
I0222 15:32:48.357121 1 util.go:48] cephfs: EXEC uname [-r]
I0222 15:32:48.357887 1 volumemounter.go:77] kernel version < 4.17 might not support quota feature, hence not loading kernel client
I0222 15:32:49.090815 1 volumemounter.go:82] loaded mounter: fuse
I0222 15:32:49.091099 1 mountcache.go:59] mount-cache: name: cephfs.csi.ceph.com, version: v1, mountCacheDir: /mount-cache-dir
I0222 15:32:49.116049 1 util.go:48] cephfs: EXEC ceph [-m 100.100.100.148,100.100.100.154,100.100.100.156 --id admin --keyfile=stripped -c /etc/ceph/ceph.conf fs dump --format=json]
I0222 15:32:55.138053 1 util.go:48] cephfs: EXEC ceph [-m 100.100.100.148,100.100.100.154,100.100.100.156 --id admin --keyfile=stripped -c /etc/ceph/ceph.conf fs ls --format=json]
I0222 15:32:56.613129 1 mount_linux.go:160] Detected OS without systemd
I0222 15:32:56.613213 1 util.go:48] cephfs: EXEC ceph-fuse [/paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount -m 100.100.100.148,100.100.100.154,100.100.100.156 -c /etc/ceph/ceph.conf -n client.admin --keyfile=stripped -r /csi-volumes/csi-vol-36021ecc-549c-11ea-abf0-fa163e3628e5 -o nonempty --client_mds_namespace=cephfs1219]
I0222 15:33:01.249513 1 util.go:48] cephfs: EXEC mount [-o bind /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount /paasdata/docker/pods/176b25eb-54ac-11ea-9d2e-744aa4028226/volumes/kubernetes.iocsi/pvc-35a5403a-549c-11ea-8500-744aa4028242/mount] I0222 15:33:01.250240 1 mountcache.go:165] mount-cache: successfully bind-mounted volume 0001-0024-96c7e5a6-04ac-11ea-8b4e-84139f31690d-0000000000000004-36021ecc-549c-11ea-abf0-fa163e3628e5: /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-35a5403a-549c-11ea-8500-744aa4028242/globalmount /paasdata/docker/pods/176b25eb-54ac-11ea-9d2e-744aa4028226/volumes/kubernetes.iocsi/pvc-35a5403a-549c-11ea-8500-744aa4028242/mount false
I0222 15:33:01.250289 1 mountcache.go:84] mount-cache: successfully remounted volume 0001-0024-96c7e5a6-04ac-11ea-8b4e-84139f31690d-0000000000000004-36021ecc-549c-11ea-abf0-fa163e3628e5
I0222 15:33:01.250304 1 mountcache.go:99] mount-cache: successfully remounted 1 volumes

mount-cache will auto remounted path when csi container restart and umount if kubernetes api server delete pod on the node lately

@humblec
Copy link
Collaborator

humblec commented Sep 29, 2020

@huaizong this shouldnt be an issue any more with latest versions. Closing this for now. Please feel free to reopen if required.

@humblec humblec closed this as completed Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants