-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux 6.9: delegate/zfs_allow_010_pos
hangs in zfs create -V
#16089
Comments
Retested with 6.9-rc7 (likely final 6.9-rc), no change. |
I did look into this a bit last night, though ultimately didn't get anywhere. There's more change in Linux in Incidentall, I think my shim for If anyone knows this better than me and is in the mood, feel free to take a look. I'm not reserving this issue to myself, just aware that 6.9 will drop soon and it'd be nice to work properly there. I'll keep noodling on this as time permits, probably a couple of hours each weekend :) |
Confirmed in 6.9.0, not that there was a doubt. |
I guess it would be advisable to avoid 6.9 until this gets fixed, right? |
6.8 is now EOL, so this is kind of unfortunate... |
I'm able to reproduce this on Ubuntu 24 using the prebuilt 6.9.3 kernel debs from https://kernel.ubuntu.com/mainline/v6.9.3/. I noticed this in dmesg when I run
I'm using |
Nevermind, this may just be UBSAN noise. |
The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Fix is here: #16282 TL;DR: The 6.9 kernel async releases block devices at return to userspace time, while 6.8 and older release synchronously. We need syncronous release since we do create+destroy in ZFS_IOC_CREATE, and all references to the zvol need to be released before we do the "destroy" part of it. Workaround is to call add_disk() in a kernel thread which changes the kernel's release codepath to do what we want. |
The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov>
The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov>
The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov>
…penzfs#16282) The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
…penzfs#16282) The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
…penzfs#16282) The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
System information
Describe the problem you're observing
Under 6.9-rcX, the test
delegate/zfs_allow_010_pos
hangs, and is eventually killed by the test runner. This does not appear to happen on earlier kernels (checked with 6.8.x, 6.1.x and 5.10.x).The process list shows a
zfs create -V
process pegging a core:Inspecting the process shows it spinning in this loop at the bottom of
zfs_ioc_create()
:Setting
zfs_flags=512
shows thatEBUSY
is consistently being returned from this test at the top ofdsl_destroy_head_check_impl()
:Typical trace is:
My gut feeling is that its something around mounts, but I've not had chance to look properly.
Describe how to reproduce the problem
The text was updated successfully, but these errors were encountered: