-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NFSv4.2 is broken across different hosts #1565
Comments
Just repeated the test with 3975.2.2 and it is also affected. Edit: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.6.55 contains a few NFS related fixes, but it's unclear to me whether that would resolve this issue. |
Hello, I could reproduce this behaviour on a two node ARM64 Flatcar latest alpha (4116.0.0) env. I think the issue is related to a host problem as the files from the secondary host are empty as shown from the host perspective:
|
Flatcar main with kernel |
Tested with Flatcar using kernel 6.10.9 and the issue is present there too, this seems to be a Linux kernel regression. Or a tooling / containerd issue - needs debugging to repro this case outside of k8s first and to better pin-point the actual cause. |
torvalds/linux@9cf2744#diff-a24af2ce5442597efe8051684905db2be615f41703247fbce9a446e77f2e9587R214 -> from the linux tree, this is the only thing I see it has changed that might affect NFS 6.6 or 6.10 vs previous ones. |
I'm wondering whether the underlying issue might be a bug in the ganesha NFS server that is now exposed by the read_plus default change. Edit: I did some additional testing and the output of |
I am trying now to build a kernel with the read_plus disabled, let's see how that goes. |
We might be interested to update our NFS test then to catch further regressions like this. (https://github.com/flatcar/mantle/blob/02348d65a5f9bd72f3e7412da54a688b7f972790/kola/tests/kubeadm/kubeadm.go#L237) |
Tested with |
Normally https://lore.kernel.org/linux-nfs and or the upstream for the server implementation but... the nfs-ganesha-server-and-external-provisioner repo (https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner) is still on Ganesha V4.0.8 whereas upstream just released V6. So there is some reason to think that this might be fixed in newer versions. |
kubernetes-sigs/nfs-ganesha-server-and-external-provisioner#152 -> there is a PR to update the chart to use Ganesha v6. |
nfs-ganesha/nfs-ganesha@24da5c3#diff-d4e3191eebe00b04019cafa02691fef13becc8cb3cc098ae6c177653cea40561R776 -> this commit is the best candidate to have a fix for this issue. |
Disable CONFIG_NFS_V4_2_READ_PLUS kernel config, as Linux kernel >= 6.6 enabled the CONFIG_NFS_V4_2_READ_PLUS config option by default, and nfs-ganesha version <= 6.1 is broken due to mishandling of the read_plus operation. See: nfs-ganesha/nfs-ganesha@24da5c3 See: flatcar/Flatcar#1565 See: nfs-ganesha/nfs-ganesha#1188
Disable CONFIG_NFS_V4_2_READ_PLUS kernel config, as Linux kernel >= 6.6 enabled the CONFIG_NFS_V4_2_READ_PLUS config option by default, and nfs-ganesha version <= 6.1 is broken due to mishandling of the read_plus operation. See: nfs-ganesha/nfs-ganesha@24da5c3 See: flatcar/Flatcar#1565 See: nfs-ganesha/nfs-ganesha#1188
Disable CONFIG_NFS_V4_2_READ_PLUS kernel config, as Linux kernel >= 6.6 enabled the CONFIG_NFS_V4_2_READ_PLUS config option by default, and nfs-ganesha version <= 6.1 is broken due to mishandling of the read_plus operation. See: nfs-ganesha/nfs-ganesha@24da5c3 See: flatcar/Flatcar#1565 See: nfs-ganesha/nfs-ganesha#1188
Disable CONFIG_NFS_V4_2_READ_PLUS kernel config, as Linux kernel >= 6.6 enabled the CONFIG_NFS_V4_2_READ_PLUS config option by default, and nfs-ganesha version <= 6.1 is broken due to mishandling of the read_plus operation. See: nfs-ganesha/nfs-ganesha@24da5c3 See: flatcar/Flatcar#1565 See: nfs-ganesha/nfs-ganesha#1188
Disable CONFIG_NFS_V4_2_READ_PLUS kernel config, as Linux kernel >= 6.6 enabled the CONFIG_NFS_V4_2_READ_PLUS config option by default, and nfs-ganesha version <= 6.1 is broken due to mishandling of the read_plus operation. See: nfs-ganesha/nfs-ganesha@24da5c3 See: flatcar/Flatcar#1565 See: nfs-ganesha/nfs-ganesha#1188
Is there any update here yet? |
Hello, the just released Flatcar versions alpha/beta/stable from https://www.flatcar.org/releases have the Linux kernel fix. |
I just gave the new flatcar version a try and NFS in works again. Thanks! |
Description
With flatcar 3975.2.1 we see very weird behavior of NFS 4.2 where one pod writes a file but a pod on a different host is unable to see the just written file content.
NFS 3 / 4.1 works as expected. (Haven't tested 4.0). Flatcar 3815.2.5 is also unaffected.
Impact
NFS 4.2 mount is unusable.
Environment and steps to reproduce
Update mount options in `StorageClass` `nfs`
create pvc
create pods (must be executed on different hosts)
a.
kubectl exec -it test-pod-1 -- bash -c 'echo "def" > /test/testfile'
b.
kubectl exec -it test-pod-2 -- bash -c 'cat /test/testfile'
cat
should return "def", but returns nothing. Note that both pods see accurate metadata (usingls -la /test
) for the fileExpected behavior
cat
fromtest-pod-2
should be able to read the just written file content. Note thattest-pod-1
is able to read the file contents.The text was updated successfully, but these errors were encountered: