Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iSCSI mount fails starting from open-iscsi-2.1.10 release #11708

Closed
flavio opened this issue Feb 4, 2025 · 4 comments
Closed

iSCSI mount fails starting from open-iscsi-2.1.10 release #11708

flavio opened this issue Feb 4, 2025 · 4 comments

Comments

@flavio
Copy link
Contributor

flavio commented Feb 4, 2025

Environmental Info:
K3s Version: k3s version v1.31.5+k3s1 (56ec5dd), go version go1.22.10

However it happens also with older release of k3s like 1.29

Node(s) CPU architecture, OS, and Version:

Linux kubetest 6.4.0-150600.23.33-default #1 SMP PREEMPT_DYNAMIC Thu Jan 9 14:10:22 UTC 2025 (ba46628) x86_64 x86_64 x86_64 GNU/Linux

I've run into this issue after I upgraded the host node from openSUSE 15.5 to openSUSE 15.6, this changed the version of open-iscsi:

  • 15.5: open-iscsi-2.1.9-150500.46.3.1.x86_64 (works)
  • 15.6: open-iscsi-2.1.10-150600.51.3.2.x86_64 (broken)

Cluster Configuration:

Single node cluster

Describe the bug:

After having upgraded my host from openSUSE Leap 15.5 to 15.6 I discovered I wasn't able to mount iSCSI volumes inside of my cluster.

The pod that uses the iSCSI volume remains in "ContainerCreating" phase forever. The journalctl shows only this message coming from the k3s service:

Feb 04 07:14:04 kubetest k3s[1356]: E0204 07:14:04.361344    1356 pod_workers.go:1301] "Error syncing pod, skipping" err="unmounted volumes=[iscsipd-rw], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/iscsipd" podUID="238c1357-6008-4932-bac1-4701182d7494

The node has successfully logged into the remote target:

iscsiadm -m session
tcp: [1] 192.168.40.6:3260,1 iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754 (non-flash)

However, no block device has been created:

 lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1  364K  0 rom
vda    254:0    0  9.5G  0 disk
├─vda1 254:1    0    2M  0 part
├─vda2 254:2    0   33M  0 part /boot/efi
└─vda3 254:3    0  9.5G  0 part /

The iSCSI target works fine when using iscsadm manually:

# pod deleted, no active iscsi session
kubetest:/var/lib/rancher/k3s/data/current # iscsiadm -m discovery -t sendtargets -p 192.168.40.6
192.168.40.6:3260,1 iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754
192.168.40.6:3260,1 iqn.2004-04.com.qnap:ts-420:iscsi.postgres.d8e754
kubetest:/var/lib/rancher/k3s/data/current # iscsiadm -m node -T iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754 -p 192.168.40.6 -l
Logging in to [iface: default, target: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754, portal: 192.168.40.6,3260]
Login to [iface: default, target: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754, portal: 192.168.40.6,3260] successful.
kubetest:/var/lib/rancher/k3s/data/current # lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0    1G  0 disk
sr0     11:0    1  364K  0 rom
vda    254:0    0  9.5G  0 disk
├─vda1 254:1    0    2M  0 part
├─vda2 254:2    0   33M  0 part /boot/efi
└─vda3 254:3    0  9.5G  0 part /

As you can see, the sda device was properly created.

What is particularly strange is that everything is temporarily fixed by restarting the iscsid service, but it works only for mounting the target once. More details later.

This what happens when the iSCSI target is not using authentication. The same problem happens when CHAP authentication is enabled.

I've run iscsid with debug messages turned on, but I didn't find any error. I've also changed the k3s service to start with --debug, however there was nothing relevant about iscsi being shown.

Everything is working fine with openSUSE Leap 15.5. The version of open-iscsi between 15.5 and 15.6 received a small bump:

  • 15.5: open-iscsi-2.1.9-150500.46.3.1.x86_64
  • 15.6: open-iscsi-2.1.10-150600.51.3.2.x86_64

The package changelog can be found here. I got suspicious about some of the changes, I installed the open-iscsi-2.1.9 package on 15.6 and the bug disappeared.

Steps To Reproduce:

I've used VMs based on openSUSE-Leap-15.5-Minimal-VM.x86_64-Cloud.qcow2 and openSUSE-Leap-15.6-Minimal-VM.x86_64-Cloud.qcow2.

In both cases:

  • Boot the VM
  • zypper in open-iscsi kernel-default
  • zypper rm kernel-default-base
  • reboot
  • install k3s: curl -sfL https://get.k3s.io | sh -

Create the following pod:

apiVersion: v1
kind: Pod
metadata:
  name: iscsipd
spec:
  containers:
  - name: iscsipd-rw
    image: registry.k8s.io/pause:latest
    volumeMounts:
    - mountPath: "/mnt/iscsipd"
      name: iscsipd-rw
  volumes:
  - name: iscsipd-rw
    iscsi:
      targetPortal: 192.168.40.6:3260 # TO BE REPLACED
      iqn: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754 # TO BE REPLACED
      lun: 0
      fsType: ext4
      readOnly: false

The pod will be stuck in "ContainerCreating" forever.

Now perform the following operations:

  • delete the pod
  • restart iscsid: systemctl restart iscsid
  • create the pod

This time the pod will be created.

Now, delete the pod and recreate it. The pod creation will be stuck in "ContainerCreating" again.

Expected behavior:

  • Pods with iSCSI volumes should not be stuck in "ContainerCreating" phase forever.
  • Log messages/kubectl describe pod should contain information that can be used to debug the issue

Actual behavior:

  • The Pods are left in "ContainerCreating" phase forever
  • The problem can be fixed once by restarting the iscsid service, but it reappears if the pod is rescheduled

Additional context / logs:

These are the messages obtained by running journalctl -f | grep iscsi.

Node fresh from boot, Pod scheduled:

Feb 04 09:33:11 kubetest k3s[1346]: I0204 09:33:11.308667    1346 reconciler.go:360] "attacherDetacher.AttachVolume started" volumeName="kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0" nodeName="kubetest" scheduledPods=["default/iscsipd"]
Feb 04 09:33:11 kubetest k3s[1346]: I0204 09:33:11.308739    1346 operation_generator.go:317] AttachVolume.Attach succeeded for volume "iscsipd-rw" (UniqueName: "kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0") from node "kubetest"
Feb 04 09:33:11 kubetest k3s[1346]: I0204 09:33:11.439042    1346 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-zfrsd\" (UniqueName: \"kubernetes.io/projected/8edd995a-9d2b-4999-b1af-b61cbb0108f7-kube-api-access-zfrsd\") pod \"iscsipd\" (UID: \"8edd995a-9d2b-4999-b1af-b61cbb0108f7\") " pod="default/iscsipd"
Feb 04 09:33:12 kubetest k3s[1346]: I0204 09:33:12.043634    1346 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"8edd995a-9d2b-4999-b1af-b61cbb0108f7\") " pod="default/iscsipd"
Feb 04 09:33:12 kubetest k3s[1346]: I0204 09:33:12.056737    1346 operation_generator.go:1491] "Controller attach succeeded for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"8edd995a-9d2b-4999-b1af-b61cbb0108f7\") device path: \"\"" pod="default/iscsipd"
Feb 04 09:33:12 kubetest k3s[1346]: I0204 09:33:12.144783    1346 operation_generator.go:538] "MountVolume.WaitForAttach entering for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"8edd995a-9d2b-4999-b1af-b61cbb0108f7\") DevicePath \"\"" pod="default/iscsipd"
Feb 04 09:33:12 kubetest systemd[1]: One time configuration for iscsi.service was skipped because of an unmet condition check (ConditionPathExists=!/etc/iscsi/initiatorname.iscsi).
Feb 04 09:33:12 kubetest kernel: iscsi: registered transport (tcp)
Feb 04 09:33:12 kubetest iscsid[3167]: iscsid: connection-1:0 Invalid timeo.noop_out_interval. Must be greater than zero. Using default 5.
Feb 04 09:33:12 kubetest iscsid[3167]: iscsid: Connection1:0 to [target: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754, portal: 192.168.40.6,3260] through [iface: default] is operational now

The pod is in "ContainerCreating" status.

The pod is being deleted:

Feb 04 09:34:07 kubetest k3s[1346]: E0204 09:34:07.499740    1346 pod_workers.go:1301] "Error syncing pod, skipping" err="unmounted volumes=[iscsipd-rw], unattached volumes=[], failed to process volumes=[]: context canceled" pod="default/iscsipd" podUID="8edd995a-9d2b-4999-b1af-b61cbb0108f7"

The iscsid service is restarted via systemctl restart iscsid:

Feb 04 09:34:57 kubetest iscsid[3167]: iscsid shutting down.
Feb 04 09:34:57 kubetest systemd[1]: iscsid.service: Deactivated successfully.
Feb 04 09:34:57 kubetest k3s[1346]: E0204 09:34:57.928841    1346 iscsi_util.go:436] iscsi: failed to get any path for iscsi disk, last err seen:
Feb 04 09:34:57 kubetest k3s[1346]: iscsi: failed to attach disk: Error: iscsiadm: read error (0/2), daemon died?
Feb 04 09:34:57 kubetest k3s[1346]: iscsiadm: Could not login to [iface: default, target: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754, portal: 192.168.40.6,3260].
Feb 04 09:34:57 kubetest k3s[1346]: iscsiadm: initiator reported error (18 - could not communicate to iscsid)
Feb 04 09:34:57 kubetest k3s[1346]: iscsiadm: Could not log into all portals
Feb 04 09:34:57 kubetest k3s[1346]: Logging in to [iface: default, target: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754, portal: 192.168.40.6,3260]
Feb 04 09:34:57 kubetest k3s[1346]: E0204 09:34:57.928943    1346 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0 podName: nodeName:}" failed. No retries permitted until 2025-02-04 09:34:58.428921564 +0000 UTC m=+164.955485664 (durationBeforeRetry 500ms). Error: MountVolume.WaitForAttach failed for volume "iscsipd-rw" (UniqueName: "kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0") pod "iscsipd" (UID: "8edd995a-9d2b-4999-b1af-b61cbb0108f7") : failed to get any path for iscsi disk, last err seen:
Feb 04 09:34:57 kubetest k3s[1346]: iscsi: failed to attach disk: Error: iscsiadm: read error (0/2), daemon died?
Feb 04 09:34:57 kubetest k3s[1346]: iscsiadm: Could not login to [iface: default, target: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754, portal: 192.168.40.6,3260].
Feb 04 09:34:57 kubetest k3s[1346]: iscsiadm: initiator reported error (18 - could not communicate to iscsid)
Feb 04 09:34:57 kubetest k3s[1346]: iscsiadm: Could not log into all portals
Feb 04 09:34:57 kubetest k3s[1346]: Logging in to [iface: default, target: iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754, portal: 192.168.40.6,3260]
Feb 04 09:34:57 kubetest systemd[1]: One time configuration for iscsi.service was skipped because of an unmet condition check (ConditionPathExists=!/etc/iscsi/initiatorname.iscsi).
Feb 04 09:34:57 kubetest k3s[1346]: I0204 09:34:57.946055    1346 reconciler_common.go:293] "Volume detached for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") on node \"kubetest\" DevicePath \"\""
Feb 04 09:34:57 kubetest iscsid[3437]: iscsid: connection1:0 Invalid timeo.noop_out_interval. Must be greater than zero. Using default 5.
Feb 04 09:34:59 kubetest iscsid[3437]: iscsid: connection1:0 is operational after recovery (1 attempts)
Feb 04 09:35:04 kubetest k3s[1346]: I0204 09:35:04.804737    1346 reconciler.go:275] "attacherDetacher.DetachVolume started" node="kubetest" volumeName="kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0"
Feb 04 09:35:04 kubetest k3s[1346]: I0204 09:35:04.816009    1346 operation_generator.go:1539] Verified volume is safe to detach for volume "iscsipd-rw" (UniqueName: "kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0") on node "kubetest"
Feb 04 09:35:04 kubetest k3s[1346]: I0204 09:35:04.816727    1346 operation_generator.go:437] DetachVolume.Detach succeeded for volume "iscsipd-rw" (UniqueName: "kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0") on node "kubetest"

The pod is created again:

Feb 04 09:35:29 kubetest k3s[1346]: I0204 09:35:29.266360    1346 reconciler.go:360] "attacherDetacher.AttachVolume started" volumeName="kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0" nodeName="kubetest" scheduledPods=["default/iscsipd"]
Feb 04 09:35:29 kubetest k3s[1346]: I0204 09:35:29.266612    1346 operation_generator.go:317] AttachVolume.Attach succeeded for volume "iscsipd-rw" (UniqueName: "kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0") from node "kubetest"
Feb 04 09:35:29 kubetest k3s[1346]: I0204 09:35:29.365279    1346 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-fzjgk\" (UniqueName: \"kubernetes.io/projected/f88215b8-e0e1-49e2-88b4-ba9f6e7d4315-kube-api-access-fzjgk\") pod \"iscsipd\" (UID: \"f88215b8-e0e1-49e2-88b4-ba9f6e7d4315\") " pod="default/iscsipd"
Feb 04 09:35:35 kubetest k3s[1346]: I0204 09:35:35.612606    1346 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"f88215b8-e0e1-49e2-88b4-ba9f6e7d4315\") " pod="default/iscsipd"
Feb 04 09:35:35 kubetest k3s[1346]: I0204 09:35:35.616174    1346 operation_generator.go:1491] "Controller attach succeeded for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"f88215b8-e0e1-49e2-88b4-ba9f6e7d4315\") device path: \"\"" pod="default/iscsipd"
Feb 04 09:35:35 kubetest k3s[1346]: I0204 09:35:35.713975    1346 operation_generator.go:538] "MountVolume.WaitForAttach entering for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"f88215b8-e0e1-49e2-88b4-ba9f6e7d4315\") DevicePath \"\"" pod="default/iscsipd"
Feb 04 09:35:36 kubetest k3s[1346]: I0204 09:35:36.748164    1346 operation_generator.go:548] "MountVolume.WaitForAttach succeeded for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"f88215b8-e0e1-49e2-88b4-ba9f6e7d4315\") DevicePath \"/dev/disk/by-path/ip-192.168.40.6:3260-iscsi-iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754-lun-0\"" pod="default/iscsipd"
Feb 04 09:35:36 kubetest systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/192.168.40.6:3260-iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754-lun-0.
Feb 04 09:35:36 kubetest k3s[1346]: I0204 09:35:36.997543    1346 operation_generator.go:580] "MountVolume.MountDevice succeeded for volume \"iscsipd-rw\" (UniqueName: \"kubernetes.io/iscsi/192.168.40.6:3260:iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754:0\") pod \"iscsipd\" (UID: \"f88215b8-e0e1-49e2-88b4-ba9f6e7d4315\") device mount path \"/var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/192.168.40.6:3260-iqn.2004-04.com.qnap:ts-420:iscsi.mariadb.d8e754-lun-0\"" pod="default/iscsipd"
Feb 04 09:35:38 kubetest k3s[1346]: I0204 09:35:38.650169    1346 pod_startup_latency_tracker.go:104] "Observed pod startup duration" pod="default/iscsipd" podStartSLOduration=9.305719225 podStartE2EDuration="9.65012584s" podCreationTimestamp="2025-02-04 09:35:29 +0000 UTC" firstStartedPulling="2025-02-04 09:35:37.231156238 +0000 UTC m=+203.757720317" lastFinishedPulling="2025-02-04 09:35:37.575562828 +0000 UTC m=+204.102126932" observedRunningTime="2025-02-04 09:35:38.648810025 +0000 UTC m=+205.175374151" watchObservedRunningTime="2025-02-04 09:35:38.65012584 +0000 UTC m=+205.176689955"
@brandond
Copy link
Member

brandond commented Feb 4, 2025

I've run into this issue after I upgraded the host node from openSUSE 15.5 to openSUSE 15.6, this changed the version of open-iscsi:

15.5: open-iscsi-2.1.9-150500.46.3.1.x86_64 (works)
15.6: open-iscsi-2.1.10-150600.51.3.2.x86_64 (broken)

Have you opened an opensuse issue? Based on the information you've shared I don't see that there is anything to fix in k3s. The only involvement we have here is that the kubelet runs mount commands on the host. Everything beyond that point is the responsibility of the host OS to work properly.

@brandond brandond closed this as completed Feb 4, 2025
@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Feb 4, 2025
@flavio
Copy link
Contributor Author

flavio commented Feb 4, 2025

@brandond can you please re-open the issue?

The bug is not caused by the open-iscsi package of openSUSE, but rather by changes done inside of open-iscsi upstream (sorry, I think I confused you by linking the package changelog instead of upstream's changelog).

I can reproduce the same behaviour on a debian system:

  • debian stable has version 2.1.8, everything works fine
  • debian unstable has version 2.1.10, the bug happens

On debian unstable I've also downgraded the open-iscsi from 2.1.10 to 2.1.8 and everything started working as expected.

I think the real question is whether this is a bug of k3s or a bug of upstream kubernetes

@brandond
Copy link
Member

brandond commented Feb 4, 2025

K3s is Kubernetes. We do not cary any patches to the Kubelet's volume mount code, and all the kubelet does is run mount and iscsiadm commands. If open-iscsi has made changes that cause previously working mount commands to fail, I would probably consider that a bug on their side. If this is an intentionally breaking change that will require a change to the mount command executed by the kubelet, that will need to be fixed upstream, not here in K3s.

@flavio
Copy link
Contributor Author

flavio commented Feb 4, 2025

Thanks for the clarification about the volume mount code.

In the meantime I was trying to reproduce the issue with something different than k3s. I was able to reproduce that with RKE2.

I'm going to open an issue against kubernetes/kubernetes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants