Linux 6.9: `delegate/zfs_allow_010_pos` hangs in `zfs create -V` #16089

robn · 2024-04-14T23:08:34Z

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	12
Kernel Version	6.9-rc[37]
Architecture	x86_64
OpenZFS Version	zfs-2.2.99-475_g04bae5ec9

Describe the problem you're observing

Under 6.9-rcX, the test delegate/zfs_allow_010_pos hangs, and is eventually killed by the test runner. This does not appear to happen on earlier kernels (checked with 6.8.x, 6.1.x and 5.10.x).

The process list shows a zfs create -V process pegging a core:

$ ps -fo pid,user,time,pcpu,args -p 8620
    PID USER         TIME %CPU COMMAND
   8620 staff1   01:20:16  100 zfs create -V 150m testpool/testfs/nvol.create.staff1.14706

Inspecting the process shows it spinning in this loop at the bottom of zfs_ioc_create():

			/*
			 * Volumes will return EBUSY and cannot be destroyed
			 * until all asynchronous minor handling (e.g. from
			 * setting the volmode property) has completed. Wait for
			 * the spa_zvol_taskq to drain then retry.
			 */
			error2 = dsl_destroy_head(fsname);
			while ((error2 == EBUSY) && (type == DMU_OST_ZVOL)) {
				error2 = spa_open(fsname, &spa, FTAG);
				if (error2 == 0) {
					taskq_wait(spa->spa_zvol_taskq);
					spa_close(spa, FTAG);
				}
				error2 = dsl_destroy_head(fsname);
			}

Setting zfs_flags=512 shows that EBUSY is consistently being returned from this test at the top of dsl_destroy_head_check_impl():

        if (zfs_refcount_count(&ds->ds_longholds) != expected_holds)
                return (SET_ERROR(EBUSY));

Typical trace is:

    dsl_destroy_head_check_impl+1
    dsl_destroy_head_check+83
    dsl_sync_task_common+346
    dsl_sync_task+22
    dsl_destroy_head+263
    zfs_ioc_create+517
    zfsdev_ioctl_common+640
    zfsdev_ioctl+79
    __x64_sys_ioctl+147
    do_syscall_64+134
    entry_SYSCALL_64_after_hwframe+113

My gut feeling is that its something around mounts, but I've not had chance to look properly.

Describe how to reproduce the problem

$ uname -a
Linux shop 6.9.0-rc3 #1 SMP PREEMPT_DYNAMIC Sun Apr 14 20:37:05 AEST 2024 x86_64 GNU/Linux
$ /usr/local/share/zfs/zfs-tests.sh -vKx -t tests/functional/delegate/zfs_allow_010_pos
...

The text was updated successfully, but these errors were encountered:

robn · 2024-05-08T05:09:49Z

Retested with 6.9-rc7 (likely final 6.9-rc), no change.

robn · 2024-05-12T23:51:00Z

I did look into this a bit last night, though ultimately didn't get anywhere. There's more change in Linux in block/bdev.c in Linux in 6.8->6.9 than I realised; it's not as simple as just replacing struct bdev_handle with plain old struct file. It looks like there might be a little more required of block devices than before, around exclusive opens and "claims", so zvol might need a bit more work. I haven't been able to pin it down yet, but clearly something somewhere is not dropping a hold on the dataset when it should.

Incidentall, I think my shim for vdev_blkdev_put() should probably be bdev_fput(), not plain fput(). But that's not the whole problem here.

If anyone knows this better than me and is in the mood, feel free to take a look. I'm not reserving this issue to myself, just aware that 6.9 will drop soon and it'd be nice to work properly there. I'll keep noodling on this as time permits, probably a couple of hours each weekend :)

robn · 2024-05-14T12:41:21Z

Confirmed in 6.9.0, not that there was a doubt.

darkbasic · 2024-05-20T14:38:27Z

I guess it would be advisable to avoid 6.9 until this gets fixed, right?

q66 · 2024-06-01T13:24:28Z

6.8 is now EOL, so this is kind of unfortunate...

tonyhutter · 2024-06-10T19:29:44Z

I'm able to reproduce this on Ubuntu 24 using the prebuilt 6.9.3 kernel debs from https://kernel.ubuntu.com/mainline/v6.9.3/. I noticed this in dmesg when I run zfs_allow_010_pos.ksh:

[   39.256293] UBSAN: array-index-out-of-bounds in /home/hutter/zfs/module/zfs/zap_micro.c:473:34
[   39.256296] index 2 is out of range for type 'mzap_ent_phys_t [1]'
[   39.256298] CPU: 8 PID: 2594 Comm: zpool Tainted: P           OE      6.9.3-060903-generic #202405300957
[   39.256300] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-4.module+el8.9.0+19570+14a90618 04/01/2014
[   39.256300] Call Trace:
[   39.256302]  <TASK>
[   39.256304]  dump_stack_lvl+0x76/0xa0
[   39.256310]  dump_stack+0x10/0x20
[   39.256311]  __ubsan_handle_out_of_bounds+0xcb/0x110
[   39.256314]  zap_lockdir_impl+0xb29/0xb40 [zfs]
[   39.256450]  zap_lockdir+0xc7/0x110 [zfs]
[   39.256549]  zap_lookup+0x58/0xd0 [zfs]
[   39.256644]  dsl_scan_init+0x156/0x610 [zfs]
[   39.256763]  ? _raw_spin_unlock+0xe/0x40
[   39.256772]  ? arc_cksum_compute.part.0+0x92/0x220 [zfs]
[   39.256882]  ? _raw_spin_unlock+0xe/0x40
[   39.256884]  ? dnode_rele_and_unlock+0x80/0x230 [zfs]
[   39.257003]  ? dnode_rele+0x48/0x90 [zfs]
[   39.257120]  ? zap_create_claim_norm_dnsize+0x14e/0x190 [zfs]
[   39.257231]  dsl_pool_create+0xcc/0x4a0 [zfs]
[   39.257352]  spa_create+0x8b6/0xe30 [zfs]
[   39.257471]  zfs_ioc_pool_create+0xaa/0x340 [zfs]
[   39.257579]  zfsdev_ioctl_common+0x82d/0xae0 [zfs]
[   39.257685]  ? __check_object_size.part.0+0x72/0x150
[   39.257688]  zfsdev_ioctl+0x57/0xf0 [zfs]
[   39.257791]  __x64_sys_ioctl+0xa0/0xf0
[   39.257793]  x64_sys_call+0x143b/0x25c0
[   39.257795]  do_syscall_64+0x7e/0x180
[   39.257797]  ? __memcg_slab_free_hook+0x115/0x180
[   39.257800]  ? fput+0xdb/0x130
[   39.257802]  ? kmem_cache_free+0x3dc/0x400
[   39.257803]  ? fput+0xdb/0x130
[   39.257805]  ? path_openat+0xd3/0x2c0
[   39.257806]  ? do_syscall_64+0x8b/0x180
[   39.257807]  ? do_filp_open+0xc0/0x170
[   39.257809]  ? putname+0x5b/0x80
[   39.257811]  ? do_sys_openat2+0x9f/0xe0
[   39.257813]  ? __x64_sys_openat+0x55/0xa0
[   39.257815]  ? syscall_exit_to_user_mode+0x81/0x270
[   39.257817]  ? do_syscall_64+0x8b/0x180
[   39.257818]  ? syscall_exit_to_user_mode+0x81/0x270
[   39.257820]  ? do_syscall_64+0x8b/0x180
[   39.257821]  ? switch_fpu_return+0x50/0xe0
[   39.257824]  ? syscall_exit_to_user_mode+0x81/0x270
[   39.257825]  ? do_syscall_64+0x8b/0x180
[   39.257827]  ? irqentry_exit_to_user_mode+0x76/0x270
[   39.257828]  ? irqentry_exit+0x43/0x50
[   39.257830]  ? clear_bhb_loop+0x15/0x70
[   39.257832]  ? clear_bhb_loop+0x15/0x70
[   39.257833]  ? clear_bhb_loop+0x15/0x70
[   39.257834]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   39.257835] RIP: 0033:0x79ccf8d24ded
[   39.257845] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[   39.257846] RSP: 002b:00007ffeecdaa2f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   39.257848] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000079ccf8d24ded
[   39.257848] RDX: 00007ffeecdaa3b0 RSI: 0000000000005a00 RDI: 0000000000000003
[   39.257849] RBP: 00007ffeecdaa340 R08: 0000000000000000 R09: 0000000000000000
[   39.257850] R10: 000079ccf901a7a8 R11: 0000000000000246 R12: 000060e82bceb2c0
[   39.257850] R13: 000060e82bcf5cb0 R14: 00007ffeecdaa3b0 R15: 00007ffeecdad9a0
[   39.257852]  </TASK>

I'm using master (20c8bdd)

tonyhutter · 2024-06-10T20:38:45Z

UBSAN: array-index-out-of-bounds in /home/hutter/zfs/module/zfs/zap_micro.c:473:34
index 2 is out of range for type 'mzap_ent_phys_t [1]'

Nevermind, this may just be UBSAN noise.

The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov>

tonyhutter · 2024-06-18T18:25:16Z

Fix is here: #16282

TL;DR: The 6.9 kernel async releases block devices at return to userspace time, while 6.8 and older release synchronously. We need syncronous release since we do create+destroy in ZFS_IOC_CREATE, and all references to the zvol need to be released before we do the "destroy" part of it. Workaround is to call add_disk() in a kernel thread which changes the kernel's release codepath to do what we want.

The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov>

…penzfs#16282) The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: openzfs#16089 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Rob Norris <rob.norris@klarasystems.com>

robn added the Type: Defect Incorrect behavior (e.g. crash, hang) label Apr 14, 2024

robn mentioned this issue Jun 6, 2024

Linux 6.10 compatibility #16250

Closed

13 tasks

tonyhutter mentioned this issue Jun 18, 2024

Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos #16282

Merged

13 tasks

tonyhutter closed this as completed in #16282 Jun 28, 2024

tonyhutter closed this as completed in 49f3ce3 Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux 6.9: `delegate/zfs_allow_010_pos` hangs in `zfs create -V` #16089

Linux 6.9: `delegate/zfs_allow_010_pos` hangs in `zfs create -V` #16089

robn commented Apr 14, 2024 •

edited

Loading

robn commented May 8, 2024

robn commented May 12, 2024

robn commented May 14, 2024

darkbasic commented May 20, 2024

q66 commented Jun 1, 2024

tonyhutter commented Jun 10, 2024 •

edited

Loading

tonyhutter commented Jun 10, 2024

tonyhutter commented Jun 18, 2024

Linux 6.9: delegate/zfs_allow_010_pos hangs in zfs create -V #16089

Linux 6.9: delegate/zfs_allow_010_pos hangs in zfs create -V #16089

Comments

robn commented Apr 14, 2024 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

robn commented May 8, 2024

robn commented May 12, 2024

robn commented May 14, 2024

darkbasic commented May 20, 2024

q66 commented Jun 1, 2024

tonyhutter commented Jun 10, 2024 • edited Loading

tonyhutter commented Jun 10, 2024

tonyhutter commented Jun 18, 2024

Linux 6.9: `delegate/zfs_allow_010_pos` hangs in `zfs create -V` #16089

Linux 6.9: `delegate/zfs_allow_010_pos` hangs in `zfs create -V` #16089

robn commented Apr 14, 2024 •

edited

Loading

tonyhutter commented Jun 10, 2024 •

edited

Loading