Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host mount not removed and can be reused if multiple CSI-SMB PV/PVCs use the same network address #353

Closed
snazzysoftware opened this issue Sep 21, 2021 · 20 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@snazzysoftware
Copy link

What happened:
The mount point on the underlying Kubernetes host for a CSI-SMB PV/PVC is not unmounted if one or more other PV/PVCs remain that are also configured with the same shared folder network address. The host mount point that remains is reused by the CSI-SMB driver for future CSI-SMB PV/PVCs with the same name, avoiding the need to provide correct credentials to gain access to the shared folder.

What you expected to happen:
I would expect that PVs used by separate pods and configured to connect to the same shared drive would be mounted and unmounted without affecting each other.

How to reproduce it:

See the attached diagram.png for an overview of the deployment timelines for this issue.

The failure scenario involves:

  • a long-lived deployment connected to a shared network folder.
  • the creation of two short-lived deployments connected to the same shared network folder address.
  1. The long-lived deployment is created first and remains running for the duration of the test.
  2. The short-lived deployment is then installed. When this deployment is uninstalled, the host mount fails to be cleared up.
  3. When the second short-lived deployment is installed, the previous host mount is reused. This gives the second deployment access to the host mount point without the correct login credentials.

All the Kubernetes definition files used in the test steps below are included in the attached yaml_files.zip file.

The steps assume availability of a SMB network share with address '//10.44.131.76/testshare', username 'testuser' and password 'correctpassword'.

  1. Install the long-lived deployment:
    1. kubectl apply -f long-lived-secret.yaml
    2. kubectl apply -f long-lived-pv.yaml
    3. kubectl apply -f long-lived-pvc.yaml
    4. kubectl apply -f long-lived-deployment.yaml
  2. Confirm that the network share has been mounted on the host and into the pod:
    1. mount | grep testshare
      //10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
      //10.44.131.76/testshare on /var/lib/kubelet/pods/96edf4c3-50a2-403a-8cfa-de829eead8ea/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
    2. kubectl get pods
      long-lived-deployment-547474ff76-7kx9p 1/1 Running 0 3m20s
    3. kubectl exec -it long-lived-deployment-547474ff76-7kx9p -- ls /mnt/smb
      ...
      test.txt
  3. Install the short-lived deployment with the correct password:
    1. kubectl apply -f short-lived-secret.yaml
    2. kubectl apply -f short-lived-pv.yaml
    3. kubectl apply -f short-lived-pvc.yaml
    4. kubectl apply -f short-lived-deployment.yaml
  4. Confirm that the network share has been mounted on the host and into the pod:
    1. mount | grep testshare
      //10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
      //10.44.131.76/testshare on /var/lib/kubelet/pods/96edf4c3-50a2-403a-8cfa-de829eead8ea/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
      //10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
      //10.44.131.76/testshare on /var/lib/kubelet/pods/19951d18-9951-4d13-9568-ccc3a89cd30d/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
    2. kubectl get pods
      short-lived-deployment-6d48cdc984-gwx4b 1/1 Running 0 85s
    3. kubectl exec -it short-lived-deployment-6d48cdc984-gwx4b -- ls /mnt/smb
      ...
      test.txt
  5. Uninstall the short-lived deployment:
    1. kubectl delete deployment short-lived-deployment
    2. kubectl delete pvc short-lived-pvc
    3. kubectl delete pv short-lived-pv
    4. kubectl delete secret short-lived-secret
  6. Unexpected result: the short-lived-pv has not been unmounted from the host:
    1. mount | grep testshare
      //10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
      //10.44.131.76/testshare on /var/lib/kubelet/pods/96edf4c3-50a2-403a-8cfa-de829eead8ea/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
      //10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
  7. Re-install the short-lived deployment using an incorrect password:
    1. kubectl apply -f short-lived-invalid-password-secret.yaml
    2. kubectl apply -f short-lived-pv.yaml
    3. kubectl apply -f short-lived-pvc.yaml
    4. kubectl apply -f short-lived-deployment.yaml
  8. Unexpected result: the short-lived deployment starts successfully and has access to the mount, despite using the incorrect credentials:
    1. kubectl get pods
      short-lived-deployment-6d48cdc984-8s2vm 1/1 Running 0 32s 
    2. kubectl exec -it short-lived-deployment-6d48cdc984-8s2vm -- ls /mnt/smb
      ...
      test.txt

Anything else we need to know?:

  • Please see the attached diagram.png for an overview of the deployment timelines for this issue.
  • Please see yaml_files.zip for all the Kubernetes definition files used in the test steps above.

Environment:

  • CSI Driver version:

kubernetes-csi/csi-driver-smb v1.2.0:
https://github.com/kubernetes-csi/csi-driver-smb/releases/tag/v1.2.0

  • Kubernetes version (use kubectl version):

'kubectl version' output:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2+k3s1", GitCommit:"5a67e8dc473f8945e8e181f6f0b0dbbc387f6fca", GitTreeState:"clean", BuildDate:"2021-06-21T20:52:44Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2+k3s1", GitCommit:"5a67e8dc473f8945e8e181f6f0b0dbbc387f6fca", GitTreeState:"clean", BuildDate:"2021-06-21T20:52:44Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

Using K3s distribution of Kubernetes.

  • OS (e.g. from /etc/os-release):

'cat /etc/os-release' output:
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Kernel (e.g. uname -a):

'uname -a' output:
Linux ussd-tst-bacn05.edgeos.illumina.com 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:
  • Others:
@snazzysoftware
Copy link
Author

yaml_files.zip

@snazzysoftware
Copy link
Author

diagram

@andyzhangx
Copy link
Member

currently one PV on one node would only have one smb mount, shared by multiple pods, so it's one mount per PV, instead of one mount per pod, this could reduce smb mount num on the node.

@snazzysoftware
Copy link
Author

Hi @andyzhangx ,

Thank you for getting back to me so quickly. Unfortunately, I'm working on a project where this workaround won't be possible.

My project involves a web interface where different users are able to log in and independently specify different network storage configurations (address, username and password). The system will then create separate PVs and start a pod that outputs data to each separate PV.

Sharing a PV between pods would require that end-users coordinate to use only a single network share, with a shared username and password, which doesn't fit the user interface design. I think that a Kubernetes clustered environment should support more than one user simultaneously, but independently, using the same network host address with different credentials.

Would it be possible for the CSI-SMB driver to support more than one PV referencing the same network host address?

Thank you for your help,

Best regards,
@snazzysoftware (Adam)

@andyzhangx
Copy link
Member

I think you could set up multiple PVs with different settings, e.g. network share, username

@snazzysoftware
Copy link
Author

Hi @andyzhangx ,

I agree that users could all specify their own configuration that uses different settings relative to each other.

The difficulty is how the users can reliably avoid specifying a configuration that is already in use. This is especially true when new users try a common, well-known network share to make sure the product is working for them.

If my project needs to enforce this uniqueness, it will need to present error messages to the user like: "You cannot use this network share configuration because it is in use by another user on the system." This is a confusing error message, especially for a shared cluster environment where user segregation is normally assured.

Would it be possible for the CSI-SMB driver to support more than one PV referencing the same network host address, share name and username combination?

Thank you for your help,

Best regards,
@snazzysoftware (Adam)

@andyzhangx
Copy link
Member

CSI-SMB driver supports multiple PVs referencing the same network host address, share name and username combination

@snazzysoftware
Copy link
Author

CSI-SMB driver supports multiple PVs referencing the same network host address, share name and username combination

The bug reproduction steps above show that OS mount points are not correctly unmounted if multiple PVs are created that reference the same network host address, share name and username combination and then one of these PVs is deleted.

Have I provided enough information in my steps above for you to reproduce this bug?

Thank you for your help,

Best regards,
@snazzysoftware (Adam)

@andyzhangx
Copy link
Member

@snazzysoftware could you provide node driver logs on that agent node, follow by: https://github.com/kubernetes-csi/csi-driver-smb/blob/master/docs/csi-debug.md#case2-volume-mountunmount-failed

If one PV is not used by any pod on the node, it should be unmounted when the last pod is terminated. There would be NodeUnstageVolume happened in node driver logs

@snazzysoftware
Copy link
Author

snazzysoftware commented Sep 22, 2021

Hi @andyzhangx ,

I've followed the bug reproduction steps above and captured timings and logs. Please see the following test transcript that shows the commands I ran and the main timestamps. Please also find attached a zip of all the logs that include a warning from the NodeUnpublishVolume step (in cs-smb-node-3.log):

W0922 11:36:52.595750       1 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/eba5a0dd-2148-40c3-98b1-051373948e02/volumes/kubernetes.io~csi/short-lived-pv/mount" is not a mountpoint, deleting

Test Transcript

$ date -u
Wed 22 Sep 11:29:08 UTC 2021

Install the long-lived deployment:

$ kubectl apply -f long-lived-secret.yaml
$ kubectl apply -f long-lived-pv.yaml
$ kubectl apply -f long-lived-pvc.yaml
$ kubectl apply -f long-lived-deployment.yaml

Capture logs 1:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-1.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-1.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -c smb -n kube-system -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Check the host OS mounts:

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ date -u
Wed 22 Sep 11:33:17 UTC 2021

Install the short-lived deployment:

$ kubectl apply -f short-lived-secret.yaml
$ kubectl apply -f short-lived-pv.yaml
$ kubectl apply -f short-lived-pvc.yaml
$ kubectl apply -f short-lived-deployment.yaml

Capture logs 2:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-2.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-2.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -n kube-system -c smb -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/eba5a0dd-2148-40c3-98b1-051373948e02/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Check the host OS mounts:

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/eba5a0dd-2148-40c3-98b1-051373948e02/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ date -u
Wed 22 Sep 11:36:07 UTC 2021

Uninstall the short-lived deployment:

$ kubectl delete deployment short-lived-deployment
$ kubectl delete pvc short-lived-pvc (hung)
$ kubectl delete pv short-lived-pv
$ kubectl delete secret short-lived-secret

Capture logs 3:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-3.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-3.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -n kube-system -c smb -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Check the host OS mounts, unexpected result 1 (host OS mount point for short-lived PV is not unmounted):

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ date -u
Wed 22 Sep 11:39:43 UTC 2021

Reinstall the short-lived deployment with incorrect password:

$ kubectl apply -f short-lived-invalid-password-secret.yaml
$ kubectl apply -f short-lived-pv.yaml
$ kubectl apply -f short-lived-pvc.yaml
$ kubectl apply -f short-lived-deployment.yaml

Capture logs 4:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-4.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     9          8d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-4.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -n kube-system -c smb -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/02b0403e-6918-4869-a1f5-ce7bc8904b37/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Unexpected result 2 (pod has access to network share with incorrect password in secret):

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/f6a86e91-52c7-4564-b531-7eac2540715e/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/02b0403e-6918-4869-a1f5-ce7bc8904b37/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ kubectl get pods
NAME                                      READY   STATUS    RESTARTS   AGE
long-lived-deployment-547474ff76-mrlmm    1/1     Running   0          13m
short-lived-deployment-6d48cdc984-mjj8s   1/1     Running   0          3m15s
$ kubectl exec -it short-lived-deployment-6d48cdc984-mjj8s -- ls /mnt/smb
test.txt

Please let me know if you require any more information. Thank you very much for your help.

Best regards,
@snazzysoftware (Adam)
logs.zip

@andyzhangx
Copy link
Member

I did not find /csi.v1.Node/NodeUnstageVolume logs in the 4 node driver logs, I am wondering is it due to these steps:

$ kubectl delete deployment short-lived-deployment
$ kubectl delete pvc short-lived-pvc (hung)
$ kubectl delete pv short-lived-pv
$ kubectl delete secret short-lived-secret

can you only delete deployment and check whether short-lived cifs mount is there?

$ kubectl delete deployment short-lived-deployment

@snazzysoftware
Copy link
Author

Hi @andyzhangx ,

I've run the bug reproduction steps above with your suggested modification to only delete the short-lived deployment. Please see the following test transcript that shows the commands I ran and the main timestamps. Please also find attached a zip of all the logs.

Test Transcript

$ date -u
Thu 23 Sep 10:16:18 UTC 2021

Install the long-lived deployment:

$ kubectl apply -f long-lived-secret.yaml
secret/long-lived-secret created
$ kubectl apply -f long-lived-pv.yaml
persistentvolume/long-lived-pv created
$ kubectl apply -f long-lived-pvc.yaml
persistentvolumeclaim/long-lived-pvc created
$ kubectl apply -f long-lived-deployment.yaml
deployment.apps/long-lived-deployment created

Capture logs 1:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-1.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-1.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -c smb -n kube-system -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Check the host OS mounts:

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ date -u
Thu 23 Sep 10:19:18 UTC 2021

Install the short-lived deployment:

$ kubectl apply -f short-lived-secret.yaml
secret/short-lived-secret created
$ kubectl apply -f short-lived-pv.yaml
persistentvolume/short-lived-pv created
$ kubectl apply -f short-lived-pvc.yaml
persistentvolumeclaim/short-lived-pvc created
$ kubectl apply -f short-lived-deployment.yaml
deployment.apps/short-lived-deployment created

Capture logs 2:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-2.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-2.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -n kube-system -c smb -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/21b9e141-84ec-4117-a87a-88f7f1544969/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Check the host OS mounts:

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/21b9e141-84ec-4117-a87a-88f7f1544969/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ date -u
Thu 23 Sep 10:21:28 UTC 2021

Modified procedure to only uninstall the short-lived deployment:

$ kubectl delete deployment short-lived-deployment
deployment.apps "short-lived-deployment" deleted

Capture logs 3:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-3.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-3.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -n kube-system -c smb -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Check the host OS mounts, unexpected result 1 (host OS mount point for short-lived PV is not unmounted):

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ date -u
Thu 23 Sep 10:23:26 UTC 2021

Reinstall the short-lived deployment with incorrect password:

$ kubectl apply -f short-lived-invalid-password-secret.yaml
secret/short-lived-secret configured
$ kubectl apply -f short-lived-pv.yaml
persistentvolume/short-lived-pv unchanged
$ kubectl apply -f short-lived-pvc.yaml
persistentvolumeclaim/short-lived-pvc unchanged
$ kubectl apply -f short-lived-deployment.yaml
deployment.apps/short-lived-deployment created

Capture logs 4:

$ kubectl get pod -n kube-system | grep csi-smb-controller
rrr-csi-smb-controller-54958bc88c-9h7bg            3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-controller-54958bc88c-9h7bg -c smb -n kube-system > csi-smb-controller-4.log
$ kubectl get pod -n kube-system | grep csi-smb-node
rrr-csi-smb-node-4dvcr                             3/3     Running     12         9d
$ kubectl logs rrr-csi-smb-node-4dvcr -c smb -n kube-system > cs-smb-node-4.log
$ kubectl exec -it rrr-csi-smb-node-4dvcr -n kube-system -c smb -- mount | grep cifs
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/94aaf40a-4358-485e-b642-f57111359fb2/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Unexpected result 2 (pod has access to network share with incorrect password in secret):

$ mount | grep testshare
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/long-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/ad8f67bf-f29b-47d3-87b5-6747c8fca5c4/volumes/kubernetes.io~csi/long-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/short-lived-pv/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
//10.44.131.76/testshare on /var/lib/kubelet/pods/94aaf40a-4358-485e-b642-f57111359fb2/volumes/kubernetes.io~csi/short-lived-pv/mount type cifs (rw,relatime,vers=3.0,cache=strict,username=testuser,domain=RGH_NAS,uid=0,noforceuid,gid=0,noforcegid,addr=10.44.131.76,file_mode=0777,dir_mode=0777,seal,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
$ kubectl get pods
NAME                                      READY   STATUS    RESTARTS   AGE
long-lived-deployment-547474ff76-r2rqq    1/1     Running   0          10m
short-lived-deployment-6d48cdc984-dkg9n   1/1     Running   0          2m3s
$ kubectl exec -it short-lived-deployment-6d48cdc984-dkg9n -- ls /mnt/smb
test.txt

Please let me know if you require any more information. Thank you very much for your help.

Best regards,
@snazzysoftware (Adam)

logs.zip

@andyzhangx
Copy link
Member

andyzhangx commented Sep 23, 2021

hi @snazzysoftware when you delete long-lived deployment, does the cifs mount of long-lived PV still exists?

Also I am not sure whether it's related, can you remove storageClassName and reset volumeHandle in short-lived-pv? I am not sure whether there is orphan pods on your existing node, maybe you should try on new node with exactly these two pv/pvc configs:

@andyzhangx
Copy link
Member

nvm, I got the repro on v1.21.1 cluster, NodeStageVolume is never invoked after deployment deleted, will check how to solve this issue.

@andyzhangx
Copy link
Member

andyzhangx commented Sep 24, 2021

I got the root cause after checking kubelet logs, during operationExecutor.UnmountDevice, it will check whether there are other paths referencing device mount path, unfortunately GetDeviceMountRefs will always return 2 references since they are using the same network access, e.g.

# mount | grep cifs | uniq | grep smb-server
//smb-server.default.svc.cluster.local/share on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-smb/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=USERNAME,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.14.5,file_mode=0777,dir_mode=0777,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
//smb-server.default.svc.cluster.local/share on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-smb2/globalmount type cifs (rw,relatime,vers=3.0,cache=strict,username=USERNAME,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.14.5,file_mode=0777,dir_mode=0777,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)

I think we may need to filter out if two references are in the same directory tree in k8s upstream operationExecutor.UnmountDevice code.

  • related kubelet logs
Sep 24 03:56:48 aks-agentpool-27620407-vmss000003 kubelet[2735]: I0924 03:56:48.825289    2735 reconciler.go:312] "operationExecutor.UnmountDevice started for volume \"pv-smb2\" (UniqueName: \"kubernetes.io/csi/smb.csi.k8s.io^unique-volumeid2\") on node \"aks-agentpool-27620407-vmss000003\" "
Sep 24 03:56:48 aks-agentpool-27620407-vmss000003 kubelet[2735]: I0924 03:56:48.825380    2735 reconciler.go:319] "Volume detached for volume \"kube-api-access-smzqv\" (UniqueName: \"kubernetes.io/projected/5d429fd5-ee9c-4a38-b2f2-c35450560db1-kube-api-access-smzqv\") on node \"aks-agentpool-27620407-vmss000003\" DevicePath \"\""
Sep 24 03:56:48 aks-agentpool-27620407-vmss000003 kubelet[2735]: E0924 03:56:48.828882    2735 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/smb.csi.k8s.io^unique-volumeid2 podName: nodeName:}" failed. No retries permitted until 2021-09-24 03:56:49.328833887 +0000 UTC m=+1432250.429266327 (durationBeforeRetry 500ms). Error: "GetDeviceMountRefs check failed for volume \"pv-smb2\" (UniqueName: \"kubernetes.io/csi/smb.csi.k8s.io^unique-volumeid2\") on node \"aks-agentpool-27620407-vmss000003\" : the device mount path \"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-smb2/globalmount\" is still mounted by other references [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-smb/globalmount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-smb/globalmount /var/lib/kubelet/pods/2a9e47b5-ab8e-463b-bb98-32127f778c65/volumes/kubernetes.io~csi/pv-smb/mount /var/lib/kubelet/pods/2a9e47b5-ab8e-463b-bb98-32127f778c65/volumes/kubernetes.io~csi/pv-smb/mount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-smb2/globalmount]"

@andyzhangx
Copy link
Member

related code is here, not sure whether there is good way to fix this issue:

// GetMountRefs finds all mount references to pathname, returns a
// list of paths. Path could be a mountpoint or a normal
// directory (for bind mount).
func (mounter *Mounter) GetMountRefs(pathname string) ([]string, error) {
	pathExists, pathErr := PathExists(pathname)
	if !pathExists {
		return []string{}, nil
	} else if IsCorruptedMnt(pathErr) {
		klog.Warningf("GetMountRefs found corrupted mount at %s, treating as unmounted path", pathname)
		return []string{}, nil
	} else if pathErr != nil {
		return nil, fmt.Errorf("error checking path %s: %v", pathname, pathErr)
	}
	realpath, err := filepath.EvalSymlinks(pathname)
	if err != nil {
		return nil, err
	}
	return SearchMountPoints(realpath, procMountInfoPath)
}

https://github.com/kubernetes/kubernetes/blob/7bff8adaf683dc7e25b5548e2c16e7393ff8a036/staging/src/k8s.io/mount-utils/mount_linux.go#L354-L372

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 23, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 22, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants