Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs sync hang #7038

Closed
ltz3317 opened this issue Jan 12, 2018 · 87 comments
Closed

zfs sync hang #7038

ltz3317 opened this issue Jan 12, 2018 · 87 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ltz3317
Copy link

ltz3317 commented Jan 12, 2018

System information

Type Version/Name
Distribution Name centos 7.2 ,sync hang
Distribution Version
Linux Kernel 3.10.0-327.13.1.el7.x86_64
Architecture
ZFS Version v0.7.5-1
SPL Version v0.7.5-1

Describe the problem you're observing

sync hang ,mysql hang,kworker cpu 100

Describe how to reproduce the problem

high frequency create/destroy/clone

Include any warning/errors/backtraces from the system logs

dmsg:
[ 189.990968] Adjusting tsc more than 11% (8039035 vs 7759471)
[ 2522.644734] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2522.644790] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2522.644854] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2522.644860] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2522.644865] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2522.644869] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2522.644873] Call Trace:
[ 2522.644882] [] schedule+0x29/0x70
[ 2522.644906] [] cv_wait_common+0x125/0x150 [spl]
[ 2522.644911] [] ? wake_up_atomic_t+0x30/0x30
[ 2522.644922] [] __cv_wait+0x15/0x20 [spl]
[ 2522.644995] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2522.645006] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2522.645017] [] ? tsd_set+0x324/0x500 [spl]
[ 2522.645022] [] ? mutex_lock+0x12/0x2f
[ 2522.645075] [] zil_commit+0x17/0x20 [zfs]
[ 2522.645128] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2522.645179] [] zpl_fsync+0x65/0x90 [zfs]
[ 2522.645186] [] do_fsync+0x65/0xa0
[ 2522.645191] [] SyS_fsync+0x10/0x20
[ 2522.645196] [] system_call_fastpath+0x16/0x1b
[ 2522.645202] INFO: task mysqld:7514 blocked for more than 120 seconds.
[ 2522.645245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2522.645296] mysqld D ffff880fb554bda0 0 7514 5602 0x00000000
[ 2522.645299] ffff880fb554bd30 0000000000000086 ffff880ff4c8f300 ffff880fb554bfd8
[ 2522.645303] ffff880fb554bfd8 ffff880fb554bfd8 ffff880ff4c8f300 ffff880fb554bdb0
[ 2522.645307] ffff88107ff8dec0 0000000000000002 ffffffff811f9420 ffff880fb554bda0
[ 2522.645312] Call Trace:
[ 2522.645316] [] ? unlock_two_nondirectories+0x60/0x60
[ 2522.645321] [] schedule+0x29/0x70
[ 2522.645324] [] inode_wait+0xe/0x20
[ 2522.645328] [] __wait_on_bit+0x60/0x90
[ 2522.645335] [] __inode_wait_for_writeback+0xaf/0xf0
[ 2522.645339] [] ? wake_atomic_t_function+0x40/0x40
[ 2522.645345] [] inode_wait_for_writeback+0x26/0x40
[ 2522.645348] [] evict+0x95/0x170
[ 2522.645351] [] iput+0xf5/0x180
[ 2522.645357] [] do_unlinkat+0x1ae/0x2b0
[ 2522.645362] [] ? SyS_readlinkat+0xd1/0x140
[ 2522.645367] [] SyS_unlink+0x16/0x20
[ 2522.645371] [] system_call_fastpath+0x16/0x1b
[ 2522.645402] INFO: task sync:2653 blocked for more than 120 seconds.
[ 2522.645443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2522.645494] sync D ffffffff8120f8c0 0 2653 1455 0x00000000
[ 2522.645498] ffff880ffedd7d50 0000000000000082 ffff880f01449700 ffff880ffedd7fd8
[ 2522.645502] ffff880ffedd7fd8 ffff880ffedd7fd8 ffff880f01449700 ffff880ffedd7e80
[ 2522.645506] ffff880ffedd7e88 7fffffffffffffff ffff880f01449700 ffffffff8120f8c0
[ 2522.645509] Call Trace:
[ 2522.645515] [] ? generic_write_sync+0x60/0x60
[ 2522.645519] [] schedule+0x29/0x70
[ 2522.645523] [] schedule_timeout+0x209/0x2d0
[ 2522.645527] [] ? __queue_work+0x136/0x320
[ 2522.645530] [] ? __queue_delayed_work+0xaa/0x1a0
[ 2522.645534] [] ? try_to_grab_pending+0xb1/0x160
[ 2522.645538] [] ? generic_write_sync+0x60/0x60
[ 2522.645543] [] wait_for_completion+0x116/0x170
[ 2522.645548] [] ? wake_up_state+0x20/0x20
[ 2522.645552] [] sync_inodes_sb+0xb7/0x1e0
[ 2522.645557] [] ? generic_write_sync+0x60/0x60
[ 2522.645561] [] sync_inodes_one_sb+0x19/0x20
[ 2522.645565] [] iterate_supers+0xb2/0x110
[ 2522.645570] [] sys_sync+0x44/0xb0
[ 2522.645575] [] system_call_fastpath+0x16/0x1b
[ 2642.739175] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2642.739228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2642.739284] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2642.739290] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2642.739296] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2642.739300] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2642.739305] Call Trace:
[ 2642.739315] [] schedule+0x29/0x70
[ 2642.739340] [] cv_wait_common+0x125/0x150 [spl]
[ 2642.739345] [] ? wake_up_atomic_t+0x30/0x30
[ 2642.739357] [] __cv_wait+0x15/0x20 [spl]
[ 2642.739434] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2642.739447] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2642.739460] [] ? tsd_set+0x324/0x500 [spl]
[ 2642.739466] [] ? mutex_lock+0x12/0x2f
[ 2642.739526] [] zil_commit+0x17/0x20 [zfs]
[ 2642.739587] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2642.739647] [] zpl_fsync+0x65/0x90 [zfs]
[ 2642.739654] [] do_fsync+0x65/0xa0
[ 2642.739659] [] SyS_fsync+0x10/0x20
[ 2642.739665] [] system_call_fastpath+0x16/0x1b
[ 2642.739670] INFO: task mysqld:7514 blocked for more than 120 seconds.
[ 2642.739727] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2642.739793] mysqld D ffff880fb554bda0 0 7514 5602 0x00000000
[ 2642.739798] ffff880fb554bd30 0000000000000086 ffff880ff4c8f300 ffff880fb554bfd8
[ 2642.739803] ffff880fb554bfd8 ffff880fb554bfd8 ffff880ff4c8f300 ffff880fb554bdb0
[ 2642.739808] ffff88107ff8dec0 0000000000000002 ffffffff811f9420 ffff880fb554bda0
[ 2642.739812] Call Trace:
[ 2642.739818] [] ? unlock_two_nondirectories+0x60/0x60
[ 2642.739823] [] schedule+0x29/0x70
[ 2642.739826] [] inode_wait+0xe/0x20
[ 2642.739831] [] __wait_on_bit+0x60/0x90
[ 2642.739839] [] __inode_wait_for_writeback+0xaf/0xf0
[ 2642.739843] [] ? wake_atomic_t_function+0x40/0x40
[ 2642.739850] [] inode_wait_for_writeback+0x26/0x40
[ 2642.739854] [] evict+0x95/0x170
[ 2642.739857] [] iput+0xf5/0x180
[ 2642.739862] [] do_unlinkat+0x1ae/0x2b0
[ 2642.739868] [] ? SyS_readlinkat+0xd1/0x140
[ 2642.739873] [] SyS_unlink+0x16/0x20
[ 2642.739878] [] system_call_fastpath+0x16/0x1b
[ 2642.739897] INFO: task sync:2653 blocked for more than 120 seconds.
[ 2642.739951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2642.740017] sync D ffffffff8120f8c0 0 2653 1455 0x00000000
[ 2642.740021] ffff880ffedd7d50 0000000000000082 ffff880f01449700 ffff880ffedd7fd8
[ 2642.740026] ffff880ffedd7fd8 ffff880ffedd7fd8 ffff880f01449700 ffff880ffedd7e80
[ 2642.740031] ffff880ffedd7e88 7fffffffffffffff ffff880f01449700 ffffffff8120f8c0
[ 2642.740036] Call Trace:
[ 2642.740042] [] ? generic_write_sync+0x60/0x60
[ 2642.740047] [] schedule+0x29/0x70
[ 2642.740051] [] schedule_timeout+0x209/0x2d0
[ 2642.740056] [] ? __queue_work+0x136/0x320
[ 2642.740060] [] ? __queue_delayed_work+0xaa/0x1a0
[ 2642.740064] [] ? try_to_grab_pending+0xb1/0x160
[ 2642.740069] [] ? generic_write_sync+0x60/0x60
[ 2642.740093] [] wait_for_completion+0x116/0x170
[ 2642.740100] [] ? wake_up_state+0x20/0x20
[ 2642.740105] [] sync_inodes_sb+0xb7/0x1e0
[ 2642.740110] [] ? generic_write_sync+0x60/0x60
[ 2642.740116] [] sync_inodes_one_sb+0x19/0x20
[ 2642.740121] [] iterate_supers+0xb2/0x110
[ 2642.740127] [] sys_sync+0x44/0xb0
[ 2642.740134] [] system_call_fastpath+0x16/0x1b
[ 2762.834495] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2762.834547] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2762.834604] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2762.834609] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2762.834615] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2762.834619] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2762.834624] Call Trace:
[ 2762.834634] [] schedule+0x29/0x70
[ 2762.834659] [] cv_wait_common+0x125/0x150 [spl]
[ 2762.834665] [] ? wake_up_atomic_t+0x30/0x30
[ 2762.834677] [] __cv_wait+0x15/0x20 [spl]
[ 2762.834748] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2762.834761] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2762.834774] [] ? tsd_set+0x324/0x500 [spl]
[ 2762.834779] [] ? mutex_lock+0x12/0x2f
[ 2762.834843] [] zil_commit+0x17/0x20 [zfs]
[ 2762.834908] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2762.834971] [] zpl_fsync+0x65/0x90 [zfs]
[ 2762.834978] [] do_fsync+0x65/0xa0
[ 2762.834983] [] SyS_fsync+0x10/0x20
[ 2762.834989] [] system_call_fastpath+0x16/0x1b
[ 2762.834994] INFO: task mysqld:7514 blocked for more than 120 seconds.
[ 2762.835051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2762.835118] mysqld D ffff880fb554bda0 0 7514 5602 0x00000000
[ 2762.835122] ffff880fb554bd30 0000000000000086 ffff880ff4c8f300 ffff880fb554bfd8
[ 2762.835127] ffff880fb554bfd8 ffff880fb554bfd8 ffff880ff4c8f300 ffff880fb554bdb0
[ 2762.835132] ffff88107ff8dec0 0000000000000002 ffffffff811f9420 ffff880fb554bda0
[ 2762.835137] Call Trace:
[ 2762.835143] [] ? unlock_two_nondirectories+0x60/0x60
[ 2762.835148] [] schedule+0x29/0x70
[ 2762.835151] [] inode_wait+0xe/0x20
[ 2762.835156] [] __wait_on_bit+0x60/0x90
[ 2762.835164] [] __inode_wait_for_writeback+0xaf/0xf0
[ 2762.835168] [] ? wake_atomic_t_function+0x40/0x40
[ 2762.835175] [] inode_wait_for_writeback+0x26/0x40
[ 2762.835179] [] evict+0x95/0x170
[ 2762.835183] [] iput+0xf5/0x180
[ 2762.835188] [] do_unlinkat+0x1ae/0x2b0
[ 2762.835195] [] ? SyS_readlinkat+0xd1/0x140
[ 2762.835199] [] SyS_unlink+0x16/0x20
[ 2762.835205] [] system_call_fastpath+0x16/0x1b
[ 2762.835223] INFO: task sync:2653 blocked for more than 120 seconds.
[ 2762.835277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2762.835343] sync D ffffffff8120f8c0 0 2653 1455 0x00000000
[ 2762.835348] ffff880ffedd7d50 0000000000000082 ffff880f01449700 ffff880ffedd7fd8
[ 2762.835352] ffff880ffedd7fd8 ffff880ffedd7fd8 ffff880f01449700 ffff880ffedd7e80
[ 2762.835357] ffff880ffedd7e88 7fffffffffffffff ffff880f01449700 ffffffff8120f8c0
[ 2762.835362] Call Trace:
[ 2762.835369] [] ? generic_write_sync+0x60/0x60
[ 2762.835374] [] schedule+0x29/0x70
[ 2762.835378] [] schedule_timeout+0x209/0x2d0
[ 2762.835382] [] ? __queue_work+0x136/0x320
[ 2762.835386] [] ? __queue_delayed_work+0xaa/0x1a0
[ 2762.835390] [] ? try_to_grab_pending+0xb1/0x160
[ 2762.835396] [] ? generic_write_sync+0x60/0x60
[ 2762.835409] [] wait_for_completion+0x116/0x170
[ 2762.835417] [] ? wake_up_state+0x20/0x20
[ 2762.835422] [] sync_inodes_sb+0xb7/0x1e0
[ 2762.835427] [] ? generic_write_sync+0x60/0x60
[ 2762.835433] [] sync_inodes_one_sb+0x19/0x20
[ 2762.835438] [] iterate_supers+0xb2/0x110
[ 2762.835443] [] sys_sync+0x44/0xb0
[ 2762.835451] [] system_call_fastpath+0x16/0x1b
[ 2882.929813] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2882.929861] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2882.929913] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2882.929919] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2882.929924] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2882.929928] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2882.929932] Call Trace:
[ 2882.929942] [] schedule+0x29/0x70
[ 2882.929967] [] cv_wait_common+0x125/0x150 [spl]
[ 2882.929972] [] ? wake_up_atomic_t+0x30/0x30
[ 2882.929983] [] __cv_wait+0x15/0x20 [spl]
[ 2882.930049] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2882.930059] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2882.930070] [] ? tsd_set+0x324/0x500 [spl]
[ 2882.930075] [] ? mutex_lock+0x12/0x2f
[ 2882.930129] [] zil_commit+0x17/0x20 [zfs]
[ 2882.930181] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2882.930232] [] zpl_fsync+0x65/0x90 [zfs]
[ 2882.930238] [] do_fsync+0x65/0xa0
[ 2882.930243] [] SyS_fsync+0x10/0x20
[ 2882.930248] [] system_call_fastpath+0x16/0x1b
[ 4802.367346] perf interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

sync process stack:
[] sync_inodes_sb+0xb7/0x1e0
[] sync_inodes_one_sb+0x19/0x20
[] iterate_supers+0xb2/0x110
[] sys_sync+0x44/0xb0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

kworker process stack:
cat /proc/3445/stack
[] dmu_zfetch+0x455/0x4f0 [zfs]
[] dbuf_read+0x8ea/0x9f0 [zfs]
[] dnode_hold_impl+0xc6/0xc30 [zfs]
[] 0xffffffffffffffff

cat /proc/3445/stack
[] 0xffffffffffffffff
[root@ifcos ~]# cat /proc/3445/stack
[] zfs_zget+0xfc/0x250 [zfs]
[] zfs_get_data+0x57/0x2d0 [zfs]
[] zil_commit.part.12+0x41c/0x830 [zfs]
[] zil_commit+0x17/0x20 [zfs]
[] zpl_writepages+0xd6/0x170 [zfs]
[] do_writepages+0x1e/0x40
[] __writeback_single_inode+0x40/0x220
[] writeback_sb_inodes+0x25e/0x420
[] wb_writeback+0xff/0x2f0
[] bdi_writeback_workfn+0x115/0x460
[] process_one_work+0x17b/0x470
[] worker_thread+0x11b/0x400
[] kthread+0xcf/0xe0
[] ret_from_fork+0x58/0x90
[] 0xffffffffffffffff

cat /proc/3445/stack
[] dmu_zfetch+0x455/0x4f0 [zfs]
[] __dbuf_hold_impl+0x135/0x5a0 [zfs]
[] 0xffffffffffffffff

cat /proc/3445/stack
[] dbuf_find+0x1c9/0x1d0 [zfs]
[] __dbuf_hold_impl+0x42/0x5a0 [zfs]
[] 0xffffffffffffffff

cat /proc/3445/stack
[] dmu_zfetch+0x455/0x4f0 [zfs]
[] dbuf_read+0x756/0x9f0 [zfs]
[] dnode_hold_impl+0xc6/0xc30 [zfs]

@xTire
Copy link

xTire commented Jan 12, 2018

The same situation with me too.

@behlendorf behlendorf added this to the 0.8.0 milestone Jan 12, 2018
@imathrowback
Copy link

Same issue. No reproduction steps identified, seems "random".

Most commonly appears on long-running disk intensive tasks (in my case caused by BURP backup software) but getting the same blocked stack traces at generic_write_sync and kworker at 100%.

Here is output from 'echo l > /proc/sysrq-trigger'

[5490997.296870] sending NMI to all CPUs:
[5490997.297924] NMI backtrace for cpu 0
[5490997.297926] CPU: 0 PID: 27539 Comm: kworker/u8:1 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
[5490997.297928] Hardware name: Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R, BIOS F4 06/29/2007
[5490997.297929] Workqueue: writeback bdi_writeback_workfn (flush-zfs-1)
[5490997.297931] task: ffff8800c7db3f40 ti: ffff880120aac000 task.ti: ffff880120aac000
[5490997.297932] RIP: 0010:[] [] read_tsc+0xa/0x20
[5490997.297933] RSP: 0018:ffff880120aaf7c8 EFLAGS: 00000246
[5490997.297935] RAX: ffff8800a5cdf0d8 RBX: ffffffff81a0e3c0 RCX: 0000000000000000
[5490997.297936] RDX: 0000000000000058 RSI: ffff8800a5cdf080 RDI: ffffffff81a0e3c0
[5490997.297937] RBP: ffff880120aaf7c8 R08: 000000000053c8f3 R09: 0000000000000000
[5490997.297939] R10: 00000000006db6e0 R11: ffffea0001935000 R12: ffff880120aaf828
[5490997.297940] R13: 000000008aaf12f6 R14: ffff8800a5cdf500 R15: ffff880127d6f3e8
[5490997.297942] FS: 0000000000000000(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
[5490997.297943] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[5490997.297944] CR2: 0000000000dde6d0 CR3: 000000005a22f000 CR4: 00000000000007f0
[5490997.297946] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[5490997.297947] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[5490997.297948] Stack:
[5490997.297949] ffff880120aaf7f0 ffffffff810ea6ae 0000000000000005 ffff8800a5cdf080
[5490997.297951] ffff880127d6f3b8 ffff880120aaf868 ffffffffc0b73a7c ffff8800c7db3f40
[5490997.297952] ffff8800c7db3f40 000000000000b048 000000000000b049 ffff880120aaf828
[5490997.297954] Call Trace:
[5490997.297955] [] getrawmonotonic64+0x2e/0xc0
[5490997.297956] [] dmu_zfetch+0x39c/0x4c0 [zfs]
[5490997.297957] [] dbuf_read+0x8e2/0x9e0 [zfs]
[5490997.297959] [] ? spl_kmem_free+0x35/0x40 [spl]
[5490997.297960] [] ? dbuf_hold_impl+0xb9/0xd0 [zfs]
[5490997.297961] [] dnode_hold_impl+0xc6/0xc30 [zfs]
[5490997.297963] [] ? avl_add+0x4a/0x80 [zavl]
[5490997.297964] [] ? mutex_lock+0x12/0x2f
[5490997.297965] [] dnode_hold+0x1b/0x20 [zfs]
[5490997.297967] [] dmu_bonus_hold+0x32/0x1d0 [zfs]
[5490997.297968] [] sa_buf_hold+0xe/0x10 [zfs]
[5490997.297969] [] zfs_zget+0x123/0x250 [zfs]
[5490997.297971] [] zfs_get_data+0x57/0x2d0 [zfs]
[5490997.297972] [] zil_commit.part.12+0x41c/0x830 [zfs]
[5490997.297973] [] zil_commit+0x17/0x20 [zfs]
[5490997.297975] [] zpl_writepages+0xd6/0x170 [zfs]
[5490997.297976] [] do_writepages+0x1e/0x40
[5490997.297977] [] __writeback_single_inode+0x40/0x220
[5490997.297979] [] writeback_sb_inodes+0x1c4/0x490
[5490997.297980] [] wb_writeback+0xff/0x2f0
[5490997.297981] [] ? set_worker_desc+0x86/0xb0
[5490997.297983] [] bdi_writeback_workfn+0x115/0x460
[5490997.297984] [] process_one_work+0x17a/0x440
[5490997.297985] [] worker_thread+0x126/0x3c0
[5490997.297987] [] ? manage_workers.isra.24+0x2a0/0x2a0
[5490997.297988] [] kthread+0xcf/0xe0
[5490997.297989] [] ? insert_kthread_work+0x40/0x40
[5490997.297991] [] ret_from_fork+0x58/0x90
[5490997.297992] [] ? insert_kthread_work+0x40/0x40
[5490997.297994] Code: 00 55 8b 05 39 cc ad 00 48 89 e5 5d c3 0f 1f 40 00 55 b8 ed ff ff ff 48 89 e5 5d c3 0f 1f 44 00 00 55 48 89 e5 66 66 90 0f ae e8 <0f> 31 89 c0 48 c1 e2 20 48 09 c2 48 89 d0 5d c3 66 0f 1f 44 00

and another shortly after

[5490939.586576] SysRq : Show backtrace of all active CPUs
[5490939.587701] sending NMI to all CPUs:
[5490939.588755] NMI backtrace for cpu 0
[5490939.588757] CPU: 0 PID: 27539 Comm: kworker/u8:1 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
[5490939.588759] Hardware name: Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R, BIOS F4 06/29/2007
[5490939.588760] Workqueue: writeback bdi_writeback_workfn (flush-zfs-1)
[5490939.588762] task: ffff8800c7db3f40 ti: ffff880120aac000 task.ti: ffff880120aac000
[5490939.588763] RIP: 0010:[] [] getrawmonotonic64+0x7a/0xc0
[5490939.588764] RSP: 0018:ffff880120aaf7d8 EFLAGS: 00000207
[5490939.588766] RAX: 00000000006db6e0 RBX: ffffffff81a0e3c0 RCX: 0000000000000018
[5490939.588767] RDX: 0000000021f8aba3 RSI: 00000000000cc2e8 RDI: 00000000001dc6c8
[5490939.588768] RBP: ffff880120aaf7f0 R08: 000000000053c8b9 R09: 0000000000000000
[5490939.588770] R10: 00000000006db6e0 R11: ffffea0001935000 R12: ffff880120aaf828
[5490939.588771] R13: 000000008aad501e R14: ffff8800a5cdf500 R15: ffff880127d6f3e8
[5490939.588772] FS: 0000000000000000(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
[5490939.588774] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[5490939.588775] CR2: 00007f4379aae8e0 CR3: 00000000b9a26000 CR4: 00000000000007f0
[5490939.588777] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[5490939.588778] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[5490939.588779] Stack:
[5490939.588781] 0000000000000007 0000000000000000 ffff880127d6f3b8 ffff880120aaf868
[5490939.588782] ffffffffc0b73a7c ffff8800c7db3f40 ffff8800c7db3f40 000000000000b048
[5490939.588784] 000000000000b049 ffff880120aaf828 000000000053c8b9 0000000022056e52
[5490939.588785] Call Trace:
[5490939.588786] [] dmu_zfetch+0x39c/0x4c0 [zfs]
[5490939.588787] [] dbuf_read+0x8e2/0x9e0 [zfs]
[5490939.588789] [] ? spl_kmem_free+0x35/0x40 [spl]
[5490939.588790] [] ? dbuf_hold_impl+0xb9/0xd0 [zfs]
[5490939.588791] [] dnode_hold_impl+0xc6/0xc30 [zfs]
[5490939.588793] [] ? avl_add+0x4a/0x80 [zavl]
[5490939.588794] [] ? mutex_lock+0x12/0x2f
[5490939.588795] [] dnode_hold+0x1b/0x20 [zfs]
[5490939.588797] [] dmu_bonus_hold+0x32/0x1d0 [zfs]
[5490939.588798] [] sa_buf_hold+0xe/0x10 [zfs]
[5490939.588799] [] zfs_zget+0x123/0x250 [zfs]
[5490939.588801] [] zfs_get_data+0x57/0x2d0 [zfs]
[5490939.588802] [] zil_commit.part.12+0x41c/0x830 [zfs]
[5490939.588803] [] zil_commit+0x17/0x20 [zfs]
[5490939.588804] [] zpl_writepages+0xd6/0x170 [zfs]
[5490939.588806] [] do_writepages+0x1e/0x40
[5490939.588807] [] __writeback_single_inode+0x40/0x220
[5490939.588809] [] writeback_sb_inodes+0x1c4/0x490
[5490939.588810] [] wb_writeback+0xff/0x2f0
[5490939.588811] [] ? set_worker_desc+0x86/0xb0
[5490939.588813] [] bdi_writeback_workfn+0x115/0x460
[5490939.588814] [] process_one_work+0x17a/0x440
[5490939.588815] [] worker_thread+0x126/0x3c0
[5490939.588817] [] ? manage_workers.isra.24+0x2a0/0x2a0
[5490939.588818] [] kthread+0xcf/0xe0
[5490939.588819] [] ? insert_kthread_work+0x40/0x40
[5490939.588820] [] ret_from_fork+0x58/0x90
[5490939.588822] [] ? insert_kthread_work+0x40/0x40
[5490939.588824] Code: da 00 48 8b 15 58 b1 da 00 44 3b 2d b9 b0 da 00 75 b0 48 29 f0 31 f6 48 21 c7 44 89 d0 48 0f 49 f7 48 0f af f0 4c 01 ce 48 d3 fe <48> 8d 04 16 48 3d ff c9 9a 3b 76 2e 31 d2 0f 1f 84 00 00 00 00

@bud4
Copy link

bud4 commented Feb 12, 2018

same problem (hp-dataprotector backup software; vbda process). While run full backup on a snapshot zfs receive does not go on . attach file output of perf record -a and perf sched . i'm using deadline scheduler on block device to try to workaround this problem. some problem with noop scheduler.
zfs version :0.6.5.6-0ubuntu16
Linux kernel: 4.4.0-101-generic #124-Ubuntu (Ubuntu 16.04.3 LTS)
perf_record-a2.data.gz
perf_shed_record2.data.gz

 iostat -xd 20
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdd               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdc               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdf               0,00     0,00    4,25   15,30   406,50  1061,58   150,19     0,03    1,73    1,08    1,91   0,40   0,78
sde               0,00     0,00    4,35   16,60   419,43   752,77   111,90     0,03    1,30    0,83    1,42   0,29   0,60
sdg               0,00     0,00  319,35    0,00  9188,00     0,00    57,54     1,01    3,16    3,16    0,00   1,71  54,56
sdh               0,00     0,00  423,45    0,00 12698,40     0,00    59,98     0,93    2,20    2,20    0,00   1,29  54,44
sdj               0,00     0,00  371,45    0,00 10857,80     0,00    58,46     1,22    3,28    3,28    0,00   1,69  62,82
sdi               0,00     0,00  515,95    0,00 15066,20     0,00    58,40     1,08    2,10    2,10    0,00   1,16  59,78
sdk               0,00     0,00  295,80    0,00  8717,00     0,00    58,94     0,60    2,05    2,05    0,00   1,39  41,16
sdl               0,00     0,00  338,10    0,00  9420,80     0,00    55,73     0,70    2,06    2,06    0,00   1,35  45,60
sdq               0,00     0,00  226,60    0,00  6327,40     0,00    55,85     0,90    3,98    3,98    0,00   2,39  54,10
sdo               0,00     0,00  482,35    0,00 14504,60     0,00    60,14     1,36    2,81    2,81    0,00   1,30  62,94
sdn               0,00     0,00  430,60    0,00 12728,60     0,00    59,12     2,08    4,82    4,82    0,00   1,85  79,58
sdm               0,00     0,00  414,70    0,00 12191,80     0,00    58,80     1,00    2,42    2,42    0,00   1,28  53,06
sdr               0,00     0,00  256,05    0,00  7346,40     0,00    57,38     0,52    2,04    2,04    0,00   1,48  37,96
sdp               0,00     0,00  438,55    0,00 12945,60     0,00    59,04     0,99    2,25    2,25    0,00   1,26  55,08
zpool status
  pool: pool_z2_samba
 state: ONLINE
  scan: scrub canceled on Fri Dec  1 09:06:55 2017
config:

	NAME        STATE     READ WRITE CKSUM
	pool_z2_samba  ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    sdg     ONLINE       0     0     0
	    sdh     ONLINE       0     0     0
	    sdi     ONLINE       0     0     0
	    sdj     ONLINE       0     0     0
	    sdk     ONLINE       0     0     0
	    sdl     ONLINE       0     0     0
	  raidz2-1  ONLINE       0     0     0
	    sdm     ONLINE       0     0     0
	    sdn     ONLINE       0     0     0
	    sdo     ONLINE       0     0     0
	    sdp     ONLINE       0     0     0
	    sdq     ONLINE       0     0     0
	    sdr     ONLINE       0     0     0
	logs
	  mirror-2  ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	cache
	  sde       ONLINE       0     0     0
	  sdf       ONLINE       0     0     0

errors: No known data errors

@bud4
Copy link

bud4 commented May 18, 2018

the problem occurs very often. I attach other data
from : echo 'w' > /proc/sysrq-trigger
and
perf record -ag
echo_w_sysrq-trigger.gz
perf.data.gz

@chadjoan
Copy link

Hi all,

I experienced this problem as well (multiple times). Then I found this thread and tried switching from noop scheduler to deadline scheduler and still experienced the problem.

I am running a 4.16.8 Gentoo kernel, with musl libc, and using fairly recent ZFS from master branch (zfs commit 670d74b , spl commit 1149b62).

This tends to happen after doing a lot of compilation. I merged about 1500 packages with the deadline scheduler before having this hit. The noop scheduler would do about 500-1500 packages before quitting. My data set is probably too small to do any comparison, and I didn't bother writing down specifics (the workflow can be complicated to begin with...).

Here's my dmesg just after the hang:
https://pastebin.com/MayYqGaV

Note that the system has 56 threads to play with and would spend some fraction of its time with a load average around 60; perhaps that is plenty of opportunity for resource contention and deadlocks?

HTH.

@vector-of-bool
Copy link

I am also experiencing this issue. I don't have as many helpful logs or diagnostics, but I can also say I am similarly doing a lot of heavy compilation and small IO. I'm also using it as a Docker storage driver, but the lockups will happen with or without Docker. I'm on an older 0.6.5.6-0ubuntu18. A reboot temporarily fixes the issue.

@bud4
Copy link

bud4 commented Jun 8, 2018

another time (4.4.0-116-generic #140-Ubuntu zfs 0.6.5.6-0ubuntu16) While run full backup
zfsBlock.log.gz

@AndCycle
Copy link

AndCycle commented Jun 20, 2018

hit similar issue, happened on Gentoo during emerge --emptytree -j4 @world,
which do all package recompile with lot's parallel access and file locking,
kworker 100% cpu usage,

kernel 4.14.39
ZFS 0.7.8
SPL 0.7.8

NMI backtrace
https://pastebin.com/rYAhR1ri

@imathrowback
Copy link

imathrowback commented Jun 20, 2018

As a troubleshooting step, I recommend pulling the drives and doing some read/write sector tests, make sure there are no controller or bad sector issues. I've hit this bug on a system where the drive seemed fine, except for the odd ZFS hang until the drive became completely unresponsive a few months later and failed.

@stewartadam
Copy link

Just chiming in to say 'me too'. Definitely seems to be a bug, disks are healthy.

@bud4
Copy link

bud4 commented Sep 20, 2018

another time sync and other processes go lock during zfs receive and full backup (4.4.0-135-generic #161-Ubuntu)
perf.data.gz
kernelStateD.log

@denizzzka
Copy link

Have this issue too.

Not very big IO. If I use Docker and compile something inside Docker box in ~12 hours system hangs (X11 freezes, mouse cursor is not moves). HDD LED blinks few times immediately after it and no more.

Hardware reset fixes this issue but sometimes scrub finds errors and three times I lost my pool (import fails with different kernel stack traces)

Also, may be it is important, I have root on this ZFS pool.

Tried to change my hardware to AMD desktop and old Intel server without any result. So I think this is software issue.

I am on Debian:

Linux localhost 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14) x86_64 GNU/Linux
zfs-dkms 0.7.9-3

@bud4
Copy link

bud4 commented Dec 6, 2018

another time sync and other processes go lock during zfs receive and full backup
attach
echo w > /proc/sysrq-trigger
echo l > /proc/sysrq-trigger
kern.log

@bud4
Copy link

bud4 commented Jan 24, 2019

Type Version/Name
Distribution Name Ubuntu
Distribution Version 16.04.5 LTS
Linux Kernel 4.4.0-141-generic
Architecture x86_64
ZFS Version 0.6.5.6-0ubuntu26
SPL Version 0.6.5.6-0ubuntu4
kernel.log.gz

it happened another time. Please help me!

@AndCycle
Copy link

@bud4 currently most developer already got their hand full,

so if you wanna they take on this problem first,
you have to provide test script that 100% able to reproduce this problem.

so they can easily reproduce this problem on their end with some automation test environment.

and it also preferable you take a spare machine upgrade to 0.7.12 and still able to reproduce this issue.

@denizzzka
Copy link

denizzzka commented Jan 24, 2019

@AndCycle
This problem is a real pain in the ass. If one of us could have provided reproduce case this issue was been closed long time ago, I believe.

This issue may not appear for 2 months and then begin to appear twice per day.

Perhaps we need to turn on some additional logging and read it throught rs232? Tell it. Just in simple words, we are not all developers here and do not know how to hunt this bug.

@bud4
Copy link

bud4 commented Jan 24, 2019

@AndCycle
Very difficult for me to be able to create scripts that reproduce the event. Also because I only have a master and a slave both in production.
The bug often occurs on the slave while the full backup is running and at the same time there is a zfs receive operation. The Hp-dataprotector agent reads the data from the mounted snapshots of the various filessytems. Maximum 6 simultaneous reading processes.
The bug also occurs on the master while there are about 800 smbd connections. (Many read operations and few write operations, but some writing operations are very intensive). In the zpool there are about 150 filesystems with active dedup and compression. The total data is about 60 Tera in raidz2 on two vdev of 6 disks each.
Please at least suggest what information to collect the next time the problem occurs. What would be useful? Do you have any scripts I can run?

@bud4
Copy link

bud4 commented Jan 29, 2019

hopeless
kernel.log.gz

@bglod
Copy link

bglod commented Feb 4, 2019

I also have this issue, and I am running zfs on root. I also run several dockers.

When the issue occurs, I can always trace it back to a period of high I/O. The type of disk activity doesn't seem to matter because sometimes it will happen during a routine scrub, before which the disks had been fairly idle for hours. Not all periods of high I/O cause the problem though.

Seemingly random problems like this, to me, point to a hardware issue (likely disk or storage controller), but badblocks and SMART tests have been coming up clean, so I'm not sure what to think at this point. I'll keep an eye on this and try to help out with logs or test data where I can.

@camaer
Copy link

camaer commented Feb 11, 2019

Same issue for us on multiple servers with high frequency create/destroy/clone.

Using Debian 9.7 and ZOL 0.7.12

Here are some logs:

execve("/usr/sbin/update-initramfs", ["update-initramfs", "-u"], [/* 16 vars */]) = 0
brk(NULL)                               = 0x56112fbcd000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17c42f7000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=65446, ...}) = 0
mmap(NULL, 65446, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f17c42e7000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\4\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1689360, ...}) = 0
mmap(NULL, 3795296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f17c3d38000
mprotect(0x7f17c3ecd000, 2097152, PROT_NONE) = 0
mmap(0x7f17c40cd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f17c40cd000
mmap(0x7f17c40d3000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f17c40d3000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17c42e5000
arch_prctl(ARCH_SET_FS, 0x7f17c42e5700) = 0
mprotect(0x7f17c40cd000, 16384, PROT_READ) = 0
mprotect(0x56112f1dd000, 8192, PROT_READ) = 0
mprotect(0x7f17c42fa000, 4096, PROT_READ) = 0
munmap(0x7f17c42e7000, 65446)           = 0
getpid()                                = 8030
rt_sigaction(SIGCHLD, {sa_handler=0x56112efd3ef0, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
geteuid()                               = 0
brk(NULL)                               = 0x56112fbcd000
brk(0x56112fbee000)                     = 0x56112fbee000
getppid()                               = 8028
stat("/root", {st_mode=S_IFDIR|0700, st_size=19, ...}) = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=19, ...}) = 0
open("/usr/sbin/update-initramfs", O_RDONLY) = 3
fcntl(3, F_DUPFD, 10)                   = 10
close(3)                                = 0
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x56112efd3ef0, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
rt_sigaction(SIGQUIT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
rt_sigaction(SIGTERM, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
read(10, "#!/bin/sh\n\nSTATEDIR=/var/lib/ini"..., 8192) = 8192
faccessat(AT_FDCWD, "/etc/initramfs-tools/update-initramfs.conf", R_OK) = 0
open("/etc/initramfs-tools/update-initramfs.conf", O_RDONLY) = 3
fcntl(3, F_DUPFD, 10)                   = 11
close(3)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
read(11, "#\n# Configuration file for updat"..., 8192) = 378
read(11, "", 8192)                      = 0
close(11)                               = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8031
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 8031
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8031, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8031
read(10, ";\n\tu)\n\t\tupdate\n\t\t;;\nesac\n", 8192) = 25
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8032
close(4)                                = 0
read(3, "4.15.18-10-pve\n4.15.18-9-pve\n4.1"..., 128) = 128
read(3, ".13.16-3-pve\n4.13.16-2-pve\n4.13."..., 128) = 128
read(3, "3.8-3-pve\n4.13.8-2-pve\n4.13.4-1-"..., 128) = 36
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8032, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 36
read(3, "", 128)                        = 0
close(3)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8032
faccessat(AT_FDCWD, "/proc/mounts", R_OK) = 0
stat("/usr/local/sbin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/bin/ischroot", {st_mode=S_IFREG|0755, st_size=10552, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8035
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 8035
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8035, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8035
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8036
close(4)                                = 0
read(3, "", 128)                        = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8036, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 0
close(3)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8036
stat("/boot/initrd.img-4.15.18-10-pve", {st_mode=S_IFREG|0644, st_size=36184842, ...}) = 0
stat("/usr/local/sbin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/bin/sha1sum", {st_mode=S_IFREG|0755, st_size=47976, ...}) = 0
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8037
close(4)                                = 0
stat("/usr/local/sbin/diff", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/diff", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/diff", 0x7ffe69edd340)  = -1 ENOENT (No such file or directory)
stat("/usr/bin/diff", {st_mode=S_IFREG|0755, st_size=146824, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8038
close(3)                                = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8037
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8037, si_uid=0, si_status=0, si_utime=9, si_stime=1} ---
rt_sigreturn({mask=[]})                 = 8037
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8038
faccessat(AT_FDCWD, "/boot/initrd.img-4.15.18-10-pve", R_OK) = 0
faccessat(AT_FDCWD, "/boot/initrd.img-4.15.18-10-pve.dpkg-bak", R_OK) = 0
stat("/usr/local/sbin/rm", 0x7ffe69edd520) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/rm", 0x7ffe69edd520) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/rm", 0x7ffe69edd520)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/rm", 0x7ffe69edd520)     = -1 ENOENT (No such file or directory)
stat("/sbin/rm", 0x7ffe69edd520)        = -1 ENOENT (No such file or directory)
stat("/bin/rm", {st_mode=S_IFREG|0755, st_size=64424, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8039
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8039
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8039, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8039
stat("/usr/local/sbin/ln", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ln", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ln", 0x7ffe69edd540)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/ln", 0x7ffe69edd540)     = -1 ENOENT (No such file or directory)
stat("/sbin/ln", 0x7ffe69edd540)        = -1 ENOENT (No such file or directory)
stat("/bin/ln", {st_mode=S_IFREG|0755, st_size=56240, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8040
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8040
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8040, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8040
write(1, "update-initramfs: Generating /bo"..., 61update-initramfs: Generating /boot/initrd.img-4.15.18-10-pve
) = 61
stat("/usr/local/sbin/mkinitramfs", 0x7ffe69edd580) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/mkinitramfs", 0x7ffe69edd580) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/mkinitramfs", {st_mode=S_IFREG|0755, st_size=10255, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8041
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8041
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8041, si_uid=0, si_status=0, si_utime=1, si_stime=3} ---
rt_sigreturn({mask=[]})                 = 8041
stat("/usr/local/sbin/mv", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/mv", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/mv", 0x7ffe69edd540)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/mv", 0x7ffe69edd540)     = -1 ENOENT (No such file or directory)
stat("/sbin/mv", 0x7ffe69edd540)        = -1 ENOENT (No such file or directory)
stat("/bin/mv", {st_mode=S_IFREG|0755, st_size=126416, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14543
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14543
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14543, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 14543
open("/var/lib/initramfs-tools/4.15.18-10-pve", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14544
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14544
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14544, si_uid=0, si_status=0, si_utime=8, si_stime=1} ---
rt_sigreturn({mask=[]})                 = 14544
dup2(11, 1)                             = 1
close(11)                               = 0
stat("/usr/local/sbin/sync", 0x7ffe69edd560) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/sync", 0x7ffe69edd560) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/sync", 0x7ffe69edd560)  = -1 ENOENT (No such file or directory)
stat("/usr/bin/sync", 0x7ffe69edd560)   = -1 ENOENT (No such file or directory)
stat("/sbin/sync", 0x7ffe69edd560)      = -1 ENOENT (No such file or directory)
stat("/bin/sync", {st_mode=S_IFREG|0755, st_size=31496, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14545
wait4(-1,
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=74244120k,nr_inodes=18561030,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=14852988k,mode=755)
rpool/ROOT/pve-1 on / type zfs (rw,relatime,xattr,noacl)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=4157)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
rpool on /rpool type zfs (rw,noatime,xattr,noacl)
rpool/ROOT on /rpool/ROOT type zfs (rw,noatime,xattr,noacl)
rpool/data on /rpool/data type zfs (rw,noatime,xattr,noacl)
rpool/data/subvol-102-disk-0 on /rpool/data/subvol-102-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-110-disk-0 on /rpool/data/subvol-110-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-111-disk-1 on /rpool/data/subvol-111-disk-1 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-120-disk-0 on /rpool/data/subvol-120-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-134-disk-1 on /rpool/data/subvol-134-disk-1 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-137-disk-1 on /rpool/data/subvol-137-disk-1 type zfs (rw,noatime,xattr,posixacl)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
dev-nas-1:/mnt/Pool-NAS/Backups on /mnt/dev-nas-1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.104,mountvers=3,mountport=881,mountproto=udp,local_lock=none,addr=192.168.1.104)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
dev-nas-1:/mnt/Pool-NAS/Proxmox on /mnt/pve/dev-nas-1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.104,mountvers=3,mountport=881,mountproto=udp,local_lock=none,addr=192.168.1.104)
rpool/data/subvol-104-disk-0 on /rpool/data/subvol-104-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-101-disk-0 on /rpool/data/subvol-101-disk-0 type zfs (rw,noatime,xattr,posixacl)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700)
rpool/data/subvol-109-disk-0 on /rpool/data/subvol-109-disk-0 type zfs (rw,noatime,xattr,posixacl)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
tmpfs on /run/user/1006 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700,uid=1006,gid=1006)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700,uid=1001,gid=1001)
[Mon Feb  4 11:08:25 2019] INFO: task kworker/u49:1:30643 blocked for more than 120 seconds.
[Mon Feb  4 11:08:25 2019]       Tainted: P        W IO     4.15.18-8-pve #1
[Mon Feb  4 11:08:25 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Feb  4 11:08:25 2019] kworker/u49:1   D    0 30643      2 0x80000000
[Mon Feb  4 11:08:25 2019] Workqueue: writeback wb_workfn (flush-zfs-9)
[Mon Feb  4 11:08:25 2019] Call Trace:
[Mon Feb  4 11:08:25 2019]  __schedule+0x3e0/0x870
[Mon Feb  4 11:08:25 2019]  schedule+0x36/0x80
[Mon Feb  4 11:08:25 2019]  io_schedule+0x16/0x40
[Mon Feb  4 11:08:25 2019]  __lock_page+0xff/0x140
[Mon Feb  4 11:08:25 2019]  ? page_cache_tree_insert+0xe0/0xe0
[Mon Feb  4 11:08:25 2019]  write_cache_pages+0x2e9/0x4b0
[Mon Feb  4 11:08:25 2019]  ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[Mon Feb  4 11:08:25 2019]  ? _cond_resched+0x1a/0x50
[Mon Feb  4 11:08:25 2019]  ? mutex_lock+0x12/0x40
[Mon Feb  4 11:08:25 2019]  ? rrw_exit+0x5a/0x150 [zfs]
[Mon Feb  4 11:08:25 2019]  zpl_writepages+0x91/0x170 [zfs]
[Mon Feb  4 11:08:25 2019]  do_writepages+0x1f/0x70
[Mon Feb  4 11:08:25 2019]  __writeback_single_inode+0x45/0x330
[Mon Feb  4 11:08:25 2019]  writeback_sb_inodes+0x266/0x590
[Mon Feb  4 11:08:25 2019]  __writeback_inodes_wb+0x92/0xc0
[Mon Feb  4 11:08:25 2019]  wb_writeback+0x288/0x320
[Mon Feb  4 11:08:25 2019]  wb_workfn+0x1a3/0x440
[Mon Feb  4 11:08:25 2019]  ? wb_workfn+0x1a3/0x440
[Mon Feb  4 11:08:25 2019]  process_one_work+0x1e0/0x400
[Mon Feb  4 11:08:25 2019]  worker_thread+0x4b/0x420
[Mon Feb  4 11:08:25 2019]  kthread+0x105/0x140
[Mon Feb  4 11:08:25 2019]  ? process_one_work+0x400/0x400
[Mon Feb  4 11:08:25 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Mon Feb  4 11:08:25 2019]  ? do_syscall_64+0x73/0x130
[Mon Feb  4 11:08:25 2019]  ? SyS_exit_group+0x14/0x20
[Mon Feb  4 11:08:25 2019]  ret_from_fork+0x35/0x40
NAME              PROPERTY              VALUE                  SOURCE
rpool/ROOT/pve-1  type                  filesystem             -
rpool/ROOT/pve-1  creation              Fri Oct 27 11:46 2017  -
rpool/ROOT/pve-1  used                  7.23G                  -
rpool/ROOT/pve-1  available             1.08T                  -
rpool/ROOT/pve-1  referenced            7.23G                  -
rpool/ROOT/pve-1  compressratio         1.60x                  -
rpool/ROOT/pve-1  mounted               yes                    -
rpool/ROOT/pve-1  quota                 none                   default
rpool/ROOT/pve-1  reservation           none                   default
rpool/ROOT/pve-1  recordsize            128K                   default
rpool/ROOT/pve-1  mountpoint            /                      local
rpool/ROOT/pve-1  sharenfs              off                    default
rpool/ROOT/pve-1  checksum              on                     default
rpool/ROOT/pve-1  compression           on                     inherited from rpool
rpool/ROOT/pve-1  atime                 off                    inherited from rpool
rpool/ROOT/pve-1  devices               on                     default
rpool/ROOT/pve-1  exec                  on                     default
rpool/ROOT/pve-1  setuid                on                     default
rpool/ROOT/pve-1  readonly              off                    default
rpool/ROOT/pve-1  zoned                 off                    default
rpool/ROOT/pve-1  snapdir               hidden                 default
rpool/ROOT/pve-1  aclinherit            restricted             default
rpool/ROOT/pve-1  createtxg             10                     -
rpool/ROOT/pve-1  canmount              on                     default
rpool/ROOT/pve-1  xattr                 on                     default
rpool/ROOT/pve-1  copies                1                      default
rpool/ROOT/pve-1  version               5                      -
rpool/ROOT/pve-1  utf8only              off                    -
rpool/ROOT/pve-1  normalization         none                   -
rpool/ROOT/pve-1  casesensitivity       sensitive              -
rpool/ROOT/pve-1  vscan                 off                    default
rpool/ROOT/pve-1  nbmand                off                    default
rpool/ROOT/pve-1  sharesmb              off                    default
rpool/ROOT/pve-1  refquota              none                   default
rpool/ROOT/pve-1  refreservation        none                   default
rpool/ROOT/pve-1  guid                  15527408840965499781   -
rpool/ROOT/pve-1  primarycache          all                    default
rpool/ROOT/pve-1  secondarycache        all                    default
rpool/ROOT/pve-1  usedbysnapshots       0B                     -
rpool/ROOT/pve-1  usedbydataset         7.23G                  -
rpool/ROOT/pve-1  usedbychildren        0B                     -
rpool/ROOT/pve-1  usedbyrefreservation  0B                     -
rpool/ROOT/pve-1  logbias               latency                default
rpool/ROOT/pve-1  dedup                 off                    default
rpool/ROOT/pve-1  mlslabel              none                   default
rpool/ROOT/pve-1  sync                  standard               inherited from rpool
rpool/ROOT/pve-1  dnodesize             legacy                 default
rpool/ROOT/pve-1  refcompressratio      1.60x                  -
rpool/ROOT/pve-1  written               7.23G                  -
rpool/ROOT/pve-1  logicalused           11.3G                  -
rpool/ROOT/pve-1  logicalreferenced     11.3G                  -
rpool/ROOT/pve-1  volmode               default                default
rpool/ROOT/pve-1  filesystem_limit      none                   default
rpool/ROOT/pve-1  snapshot_limit        none                   default
rpool/ROOT/pve-1  filesystem_count      none                   default
rpool/ROOT/pve-1  snapshot_count        none                   default
rpool/ROOT/pve-1  snapdev               hidden                 default
rpool/ROOT/pve-1  acltype               off                    default
rpool/ROOT/pve-1  context               none                   default
rpool/ROOT/pve-1  fscontext             none                   default
rpool/ROOT/pve-1  defcontext            none                   default
rpool/ROOT/pve-1  rootcontext           none                   default
rpool/ROOT/pve-1  relatime              on                     temporary
rpool/ROOT/pve-1  redundant_metadata    all                    default
rpool/ROOT/pve-1  overlay               off                    default

Let me know if you need anything else

@CySlider
Copy link

CySlider commented Mar 7, 2019

Happens to me too on my Manjaro 4.19 desktop system with zol 0.7.12-11
Really annoying. once ever few hours. Since my homes (not root) are on the pool a lot starts hanging.
Most annoying I can't even reboot cleanly as the reboot never manages to unmount everything and reboot also hangs.

[87611.236429] INFO: task txg_sync:1006 blocked for more than 120 seconds.
[87611.236434]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87611.236435] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87611.236437] txg_sync        D    0  1006      2 0x80000080
[87611.236440] Call Trace:
[87611.236448]  ? __schedule+0x29b/0x8b0
[87611.236482]  ? zio_taskq_dispatch+0x6f/0x90 [zfs]
[87611.236484]  schedule+0x32/0x90
[87611.236487]  io_schedule+0x12/0x40
[87611.236493]  cv_wait_common+0xaa/0x130 [spl]
[87611.236497]  ? wait_woken+0x80/0x80
[87611.236524]  zio_wait+0x113/0x1b0 [zfs]
[87611.236548]  dsl_pool_sync+0xb9/0x400 [zfs]
[87611.236576]  spa_sync+0x48b/0xd30 [zfs]
[87611.236604]  txg_sync_thread+0x2c6/0x4a0 [zfs]
[87611.236607]  ? _raw_spin_unlock_irq+0x1d/0x30
[87611.236633]  ? txg_thread_exit.isra.4+0x60/0x60 [zfs]
[87611.236636]  ? __thread_exit+0x20/0x20 [spl]
[87611.236639]  thread_generic_wrapper+0x6f/0x80 [spl]
[87611.236642]  kthread+0x112/0x130
[87611.236644]  ? kthread_park+0x80/0x80
[87611.236646]  ret_from_fork+0x22/0x40
[87611.236779] INFO: task TaskSchedulerFo:9759 blocked for more than 120 seconds.
[87611.236781]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87611.236782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87611.236783] TaskSchedulerFo D    0  9759   9346 0x00000080
[87611.236785] Call Trace:
[87611.236787]  ? __schedule+0x29b/0x8b0
[87611.236789]  schedule+0x32/0x90
[87611.236790]  io_schedule+0x12/0x40
[87611.236794]  cv_wait_common+0xaa/0x130 [spl]
[87611.236796]  ? wait_woken+0x80/0x80
[87611.236823]  zio_wait+0x113/0x1b0 [zfs]
[87611.236849]  zil_commit.part.8+0x85c/0x940 [zfs]
[87611.236875]  zfs_fsync+0x70/0xd0 [zfs]
[87611.236900]  zpl_fsync+0x63/0x90 [zfs]
[87611.236903]  ? __fget+0x6e/0xa0
[87611.236906]  do_fsync+0x38/0x70
[87611.236908]  __x64_sys_fdatasync+0x13/0x20
[87611.236910]  do_syscall_64+0x65/0x180
[87611.236912]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[87611.236914] RIP: 0033:0x7f28924bf73f
[87611.236919] Code: Bad RIP value.
[87611.236920] RSP: 002b:00007f28810473e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[87611.236922] RAX: ffffffffffffffda RBX: 0000000000000073 RCX: 00007f28924bf73f
[87611.236923] RDX: 0000000000000000 RSI: 000037ea5e9df880 RDI: 0000000000000073
[87611.236924] RBP: 00007f2881047430 R08: 0000000000000000 R09: 00000000f1b86835
[87611.236925] R10: 00000000f1b86835 R11: 0000000000000293 R12: 00007f28924bf700
[87611.236925] R13: 00007f2896b3a3f0 R14: 0000000000000000 R15: 000037ea5e9df8a0
[87734.114347] INFO: task txg_sync:1006 blocked for more than 120 seconds.
[87734.114351]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87734.114352] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87734.114353] txg_sync        D    0  1006      2 0x80000080
[87734.114355] Call Trace:
[87734.114363]  ? __schedule+0x29b/0x8b0
[87734.114392]  ? zio_taskq_dispatch+0x6f/0x90 [zfs]
[87734.114394]  schedule+0x32/0x90
[87734.114396]  io_schedule+0x12/0x40
[87734.114402]  cv_wait_common+0xaa/0x130 [spl]
[87734.114404]  ? wait_woken+0x80/0x80
[87734.114426]  zio_wait+0x113/0x1b0 [zfs]
[87734.114446]  dsl_pool_sync+0xb9/0x400 [zfs]
[87734.114468]  spa_sync+0x48b/0xd30 [zfs]
[87734.114491]  txg_sync_thread+0x2c6/0x4a0 [zfs]
[87734.114493]  ? _raw_spin_unlock_irq+0x1d/0x30
[87734.114514]  ? txg_thread_exit.isra.4+0x60/0x60 [zfs]
[87734.114517]  ? __thread_exit+0x20/0x20 [spl]
[87734.114519]  thread_generic_wrapper+0x6f/0x80 [spl]
[87734.114522]  kthread+0x112/0x130
[87734.114523]  ? kthread_park+0x80/0x80
[87734.114525]  ret_from_fork+0x22/0x40
[87734.114652] INFO: task TaskSchedulerFo:9759 blocked for more than 120 seconds.
[87734.114653]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87734.114654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87734.114655] TaskSchedulerFo D    0  9759   9346 0x00000080
[87734.114656] Call Trace:
[87734.114658]  ? __schedule+0x29b/0x8b0
[87734.114660]  schedule+0x32/0x90
[87734.114661]  io_schedule+0x12/0x40
[87734.114664]  cv_wait_common+0xaa/0x130 [spl]
[87734.114666]  ? wait_woken+0x80/0x80
[87734.114687]  zio_wait+0x113/0x1b0 [zfs]
[87734.114708]  zil_commit.part.8+0x85c/0x940 [zfs]
[87734.114729]  zfs_fsync+0x70/0xd0 [zfs]
[87734.114750]  zpl_fsync+0x63/0x90 [zfs]
[87734.114753]  ? __fget+0x6e/0xa0
[87734.114755]  do_fsync+0x38/0x70
[87734.114756]  __x64_sys_fdatasync+0x13/0x20
[87734.114758]  do_syscall_64+0x65/0x180
[87734.114760]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[87734.114762] RIP: 0033:0x7f28924bf73f
[87734.114766] Code: Bad RIP value.
[87734.114767] RSP: 002b:00007f28810473e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[87734.114769] RAX: ffffffffffffffda RBX: 0000000000000073 RCX: 00007f28924bf73f
[87734.114769] RDX: 0000000000000000 RSI: 000037ea5e9df880 RDI: 0000000000000073
[87734.114770] RBP: 00007f2881047430 R08: 0000000000000000 R09: 00000000f1b86835
[87734.114771] R10: 00000000f1b86835 R11: 0000000000000293 R12: 00007f28924bf700
[87734.114771] R13: 00007f2896b3a3f0 R14: 0000000000000000 R15: 000037ea5e9df8a0
[87734.114773] INFO: task TaskSchedulerFo:24195 blocked for more than 120 seconds.
[87734.114774]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87734.114775] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87734.114776] TaskSchedulerFo D    0 24195   9346 0x00000080
[87734.114777] Call Trace:
[87734.114779]  ? __schedule+0x29b/0x8b0
[87734.114781]  ? preempt_count_add+0x79/0xb0
[87734.114783]  schedule+0x32/0x90
[87734.114786]  cv_wait_common+0xf6/0x130 [spl]
[87734.114787]  ? wait_woken+0x80/0x80
[87734.114807]  zil_commit.part.8+0x80/0x940 [zfs]
[87734.114809]  ? _raw_spin_lock+0x13/0x40
[87734.114809]  ? _raw_spin_unlock+0x16/0x30
[87734.114812]  ? tsd_set+0x203/0x4f0 [spl]
[87734.114813]  ? preempt_count_add+0x79/0xb0
[87734.114833]  zfs_fsync+0x70/0xd0 [zfs]
[87734.114853]  zpl_fsync+0x63/0x90 [zfs]
[87734.114854]  ? __fget+0x6e/0xa0
[87734.114855]  do_fsync+0x38/0x70
[87734.114857]  __x64_sys_fdatasync+0x13/0x20
[87734.114858]  do_syscall_64+0x65/0x180
[87734.114859]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[87734.114860] RIP: 0033:0x7f28924bf73f
[87734.114861] Code: Bad RIP value.
[87734.114862] RSP: 002b:00007f287f043cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[87734.114863] RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007f28924bf73f
[87734.114864] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000000000005d
[87734.114864] RBP: 00007f287f043d40 R08: 0000000000000000 R09: 0000000000000080
[87734.114865] R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000000
[87734.114865] R13: d763a120f905d5d9 R14: 0000000000000000 R15: 0000000000001000
[87734.114878] INFO: task CIPCServer::Thr:13852 blocked for more than 120 seconds.
[87734.114879]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87734.114880] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87734.114881] CIPCServer::Thr D    0 13852  13418 0x20020080
[87734.114882] Call Trace:
[87734.114884]  ? __schedule+0x29b/0x8b0
[87734.114885]  ? __switch_to_asm+0x40/0x70
[87734.114886]  ? preempt_count_add+0x79/0xb0
[87734.114887]  schedule+0x32/0x90
[87734.114890]  cv_wait_common+0xf6/0x130 [spl]
[87734.114892]  ? wait_woken+0x80/0x80
[87734.114911]  zil_commit.part.8+0x80/0x940 [zfs]
[87734.114913]  ? _raw_spin_lock+0x13/0x40
[87734.114913]  ? _raw_spin_unlock+0x16/0x30
[87734.114916]  ? tsd_set+0x203/0x4f0 [spl]
[87734.114918]  ? preempt_count_add+0x79/0xb0
[87734.114937]  zfs_fsync+0x70/0xd0 [zfs]
[87734.114957]  zpl_fsync+0x63/0x90 [zfs]
[87734.114958]  ? __fget+0x6e/0xa0
[87734.114959]  do_fsync+0x38/0x70
[87734.114960]  __ia32_sys_fdatasync+0x13/0x20
[87734.114962]  do_fast_syscall_32+0xa7/0x2a0
[87734.114963]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[87856.992274] INFO: task txg_sync:1006 blocked for more than 120 seconds.
[87856.992279]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87856.992281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87856.992284] txg_sync        D    0  1006      2 0x80000080
[87856.992287] Call Trace:
[87856.992297]  ? __schedule+0x29b/0x8b0
[87856.992340]  ? zio_taskq_dispatch+0x6f/0x90 [zfs]
[87856.992343]  schedule+0x32/0x90
[87856.992347]  io_schedule+0x12/0x40
[87856.992354]  cv_wait_common+0xaa/0x130 [spl]
[87856.992358]  ? wait_woken+0x80/0x80
[87856.992396]  zio_wait+0x113/0x1b0 [zfs]
[87856.992431]  dsl_pool_sync+0xb9/0x400 [zfs]
[87856.992473]  spa_sync+0x48b/0xd30 [zfs]
[87856.992513]  txg_sync_thread+0x2c6/0x4a0 [zfs]
[87856.992516]  ? _raw_spin_unlock_irq+0x1d/0x30
[87856.992553]  ? txg_thread_exit.isra.4+0x60/0x60 [zfs]
[87856.992558]  ? __thread_exit+0x20/0x20 [spl]
[87856.992562]  thread_generic_wrapper+0x6f/0x80 [spl]
[87856.992565]  kthread+0x112/0x130
[87856.992567]  ? kthread_park+0x80/0x80
[87856.992570]  ret_from_fork+0x22/0x40
[87856.992714] INFO: task TaskSchedulerFo:9759 blocked for more than 120 seconds.
[87856.992717]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87856.992718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87856.992720] TaskSchedulerFo D    0  9759   9346 0x00000080
[87856.992722] Call Trace:
[87856.992725]  ? __schedule+0x29b/0x8b0
[87856.992728]  schedule+0x32/0x90
[87856.992730]  io_schedule+0x12/0x40
[87856.992735]  cv_wait_common+0xaa/0x130 [spl]
[87856.992738]  ? wait_woken+0x80/0x80
[87856.992775]  zio_wait+0x113/0x1b0 [zfs]
[87856.992811]  zil_commit.part.8+0x85c/0x940 [zfs]
[87856.992848]  zfs_fsync+0x70/0xd0 [zfs]
[87856.992884]  zpl_fsync+0x63/0x90 [zfs]
[87856.992888]  ? __fget+0x6e/0xa0
[87856.992891]  do_fsync+0x38/0x70
[87856.992893]  __x64_sys_fdatasync+0x13/0x20
[87856.992896]  do_syscall_64+0x65/0x180
[87856.992899]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[87856.992901] RIP: 0033:0x7f28924bf73f
[87856.992907] Code: Bad RIP value.
[87856.992908] RSP: 002b:00007f28810473e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[87856.992910] RAX: ffffffffffffffda RBX: 0000000000000073 RCX: 00007f28924bf73f
[87856.992911] RDX: 0000000000000000 RSI: 000037ea5e9df880 RDI: 0000000000000073
[87856.992912] RBP: 00007f2881047430 R08: 0000000000000000 R09: 00000000f1b86835
[87856.992913] R10: 00000000f1b86835 R11: 0000000000000293 R12: 00007f28924bf700
[87856.992914] R13: 00007f2896b3a3f0 R14: 0000000000000000 R15: 000037ea5e9df8a0
[87856.992918] INFO: task TaskSchedulerFo:24195 blocked for more than 120 seconds.
[87856.992920]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87856.992921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87856.992922] TaskSchedulerFo D    0 24195   9346 0x00000080
[87856.992924] Call Trace:
[87856.992927]  ? __schedule+0x29b/0x8b0
[87856.992930]  ? preempt_count_add+0x79/0xb0
[87856.992933]  schedule+0x32/0x90
[87856.992938]  cv_wait_common+0xf6/0x130 [spl]
[87856.992940]  ? wait_woken+0x80/0x80
[87856.992975]  zil_commit.part.8+0x80/0x940 [zfs]
[87856.992978]  ? _raw_spin_lock+0x13/0x40
[87856.992979]  ? _raw_spin_unlock+0x16/0x30
[87856.992984]  ? tsd_set+0x203/0x4f0 [spl]
[87856.992986]  ? preempt_count_add+0x79/0xb0
[87856.993021]  zfs_fsync+0x70/0xd0 [zfs]
[87856.993056]  zpl_fsync+0x63/0x90 [zfs]
[87856.993058]  ? __fget+0x6e/0xa0
[87856.993060]  do_fsync+0x38/0x70
[87856.993062]  __x64_sys_fdatasync+0x13/0x20
[87856.993064]  do_syscall_64+0x65/0x180
[87856.993067]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[87856.993068] RIP: 0033:0x7f28924bf73f
[87856.993071] Code: Bad RIP value.
[87856.993071] RSP: 002b:00007f287f043cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[87856.993073] RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007f28924bf73f
[87856.993074] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000000000005d
[87856.993075] RBP: 00007f287f043d40 R08: 0000000000000000 R09: 0000000000000080
[87856.993076] R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000000
[87856.993077] R13: d763a120f905d5d9 R14: 0000000000000000 R15: 0000000000001000
[87856.993092] INFO: task CIPCServer::Thr:13852 blocked for more than 120 seconds.
[87856.993094]       Tainted: P           OE     4.19.23-1-MANJARO #1
[87856.993096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87856.993097] CIPCServer::Thr D    0 13852  13418 0x20020084
[87856.993099] Call Trace:
[87856.993102]  ? __schedule+0x29b/0x8b0
[87856.993104]  ? __switch_to_asm+0x40/0x70
[87856.993106]  ? preempt_count_add+0x79/0xb0
[87856.993109]  schedule+0x32/0x90
[87856.993113]  cv_wait_common+0xf6/0x130 [spl]
[87856.993116]  ? wait_woken+0x80/0x80
[87856.993150]  zil_commit.part.8+0x80/0x940 [zfs]
[87856.993153]  ? _raw_spin_lock+0x13/0x40
[87856.993154]  ? _raw_spin_unlock+0x16/0x30
[87856.993159]  ? tsd_set+0x203/0x4f0 [spl]
[87856.993161]  ? preempt_count_add+0x79/0xb0
[87856.993196]  zfs_fsync+0x70/0xd0 [zfs]
[87856.993230]  zpl_fsync+0x63/0x90 [zfs]
[87856.993232]  ? __fget+0x6e/0xa0
[87856.993234]  do_fsync+0x38/0x70
[87856.993236]  __ia32_sys_fdatasync+0x13/0x20
[87856.993238]  do_fast_syscall_32+0xa7/0x2a0
[87856.993241]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d 

Also the process is still stuck in D even waiting far longer:

$ ps aux | grep D
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
...
root      1006  0.0  0.0      0     0 ?        D    Feb23   0:19 [txg_sync] 

My 32Gb memory is still only 40% full

S.M.A.R.T data of all disks is fine and it seems the unmount is stuck at random disks each time.

My pool also is fine:


  pool: tank
 state: ONLINE
  scan: resilvered 11,8T in 13h51m with 0 errors on Wed Jan  2 02:00:28 2019
config:

    NAME                                              STATE READ WRITE CKSUM
    tank                                              ONLINE 0     0     0
      raidz2-0                                        ONLINE 0     0     0
        sdx_crypt10                                   ONLINE 0     0     0
        sdx_crypt11                                   ONLINE 0     0     0
        sdx_crypt7                                    ONLINE 0     0     0
        sdx_crypt6                                    ONLINE 0     0     0
    cache
      nvme-Samsung_SSD_960_EVO_500GB_S3EUNX0J215828E  ONLINE 0     0     0

errors: No known data errors

My pool consists of 4 10Tb LUKS encrypted disks. And one m.2 NVME drive as cache.

Here my iostat -mx output:

iostat -mx
Linux 4.19.23-1-MANJARO (I-KNOW-YOU.torgato.de)     25.02.2019 _x86_64_    (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5,86    0,00    1,90    4,02    0,00   88,22

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
nvme1n1          0,66    3,77      0,02      0,10     0,01 2,92   0,91  43,67    0,33    1,19   0,61    24,97    26,17 136,18  60,30
nvme0n1          0,03    0,99      0,00      0,08     0,00 0,00   0,00   0,00    0,44    2,01   0,60    36,25    81,47 592,53  60,21
sda              0,22    4,78      0,01      0,10     0,00 0,03   0,00   0,71    4,66    2,96   0,00    47,71    21,38 0,43   0,22
sdb              0,22    4,80      0,01      0,10     0,00 0,03   0,00   0,72    4,58    3,01   0,00    47,46    21,32 0,43   0,22
sdc              0,22    4,78      0,01      0,10     0,00 0,03   0,00   0,72    4,63    2,96   0,00    47,53    21,37 0,43   0,22
sdd              0,00    0,00      0,00      0,00     0,00 0,00   0,00   0,00    1,69    0,00   0,00    19,63     0,00 1,65   0,00
sde              0,22    4,80      0,01      0,10     0,00 0,03   0,00   0,71    4,53    2,84   0,00    47,66    21,26 0,43   0,22
dm-0             0,51    4,54      0,01      0,10     0,00 0,00   0,00   0,00   23,68    0,73   0,02    20,92    22,54 2,59   1,31
dm-1             0,51    4,55      0,01      0,10     0,00 0,00   0,00   0,00   22,65    0,72   0,01    20,97    22,43 2,48   1,26
dm-2             0,51    4,55      0,01      0,10     0,00 0,00   0,00   0,00   24,22    0,73   0,02    20,96    22,48 2,65   1,34
dm-3             0,51    4,53      0,01      0,10     0,00 0,00   0,00   0,00   23,64    0,73   0,28    21,04    22,56 20,12  10,14 

Shutdown log:

shutdown log:

Feb 23 01:12:39 I-KNOW-YOU.torgato.de systemd[1]: Stopped Cryptography Setup for sdx_crypt10.
Feb 23 01:12:39 I-KNOW-YOU.torgato.de audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@sdx_crypt10 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb 23 01:12:39 I-KNOW-YOU.torgato.de kernel: audit: type=1131 audit(1550880759.194:135): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@sdx_crypt10 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb 23 01:12:39 I-KNOW-YOU.torgato.de systemd-cryptsetup[28103]: device-mapper: remove ioctl on sdx_crypt6  failed: Device or resource busy
Feb 23 01:12:39 I-KNOW-YOU.torgato.de systemd-cryptsetup[28103]: Device sdx_crypt6 is still in use.
Feb 23 01:12:39 I-KNOW-YOU.torgato.de systemd-cryptsetup[28103]: Failed to deactivate: Device or resource busy
Feb 23 01:12:39 I-KNOW-YOU.torgato.de systemd[1]: systemd-cryptsetup@sdx_crypt6.service: Control process exited, code=exited, status=1/FAILURE
Feb 23 01:12:39 I-KNOW-YOU.torgato.de systemd[1]: systemd-cryptsetup@sdx_crypt6.service: Failed with result 'exit-code'.
Feb 23 01:12:39 I-KNOW-YOU.torgato.de systemd[1]: Stopped Cryptography Setup for sdx_crypt6.
Feb 23 01:12:39 I-KNOW-YOU.torgato.de audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@sdx_crypt6 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb 23 01:12:39 I-KNOW-YOU.torgato.de kernel: audit: type=1131 audit(1550880759.217:136): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-cryptsetup@sdx_crypt6 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Condition check resulted in Generate shutdown-ramfs being skipped.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting /tank/Torge...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting Temporary Directory (/tmp)...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28167]: umount: /tank/Torge: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tank-Torge.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28168]: umount: /tmp: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting /tank/Torge.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting Temporary Directory (/tmp).
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting Temporary Directory (/tmp)...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Condition check resulted in Generate shutdown-ramfs being skipped.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting /tank/Torge...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28170]: umount: /tmp: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting Temporary Directory (/tmp).
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28171]: umount: /tank/Torge: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tank-Torge.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting /tank/Torge.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Condition check resulted in Generate shutdown-ramfs being skipped.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting /tank/Torge...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting Temporary Directory (/tmp)...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28174]: umount: /tank/Torge: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tank-Torge.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting /tank/Torge.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28175]: umount: /tmp: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting Temporary Directory (/tmp).
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting /tank/Torge...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Unmounting Temporary Directory (/tmp)...
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Condition check resulted in Generate shutdown-ramfs being skipped.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28177]: umount: /tank/Torge: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tank-Torge.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting /tank/Torge.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de umount[28178]: umount: /tmp: target is busy.
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:06 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting Temporary Directory (/tmp).
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Condition check resulted in Generate shutdown-ramfs being skipped.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Unmounting Temporary Directory (/tmp)...
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Unmounting /tank/Torge...
Feb 23 01:13:07 I-KNOW-YOU.torgato.de umount[28181]: umount: /tmp: target is busy.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting Temporary Directory (/tmp).
Feb 23 01:13:07 I-KNOW-YOU.torgato.de umount[28182]: umount: /tank/Torge: target is busy.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: tank-Torge.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting /tank/Torge.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Condition check resulted in Generate shutdown-ramfs being skipped.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Unmounting /tank/Torge...
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Unmounting Temporary Directory (/tmp)...
Feb 23 01:13:07 I-KNOW-YOU.torgato.de umount[28184]: umount: /tank/Torge: target is busy.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: tank-Torge.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting /tank/Torge.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de umount[28185]: umount: /tmp: target is busy.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting Temporary Directory (/tmp).
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Condition check resulted in Generate shutdown-ramfs being skipped.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Unmounting Temporary Directory (/tmp)...
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Unmounting /tank/Torge...
Feb 23 01:13:07 I-KNOW-YOU.torgato.de umount[28187]: umount: /tmp: target is busy.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting Temporary Directory (/tmp).
Feb 23 01:13:07 I-KNOW-YOU.torgato.de umount[28188]: umount: /tank/Torge: target is busy.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: tank-Torge.mount: Mount process exited, code=exited, status=32/n/a
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Failed unmounting /tank/Torge.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Received SIGINT.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Forcibly rebooting: Ctrl-Alt-Del was pressed more than 7 times within 2s
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd[1]: Shutting down.
Feb 23 01:13:07 I-KNOW-YOU.torgato.de kernel: systemd-shutdow: 49 output lines suppressed due to ratelimiting
Feb 23 01:13:07 I-KNOW-YOU.torgato.de systemd-shutdown[1]: Syncing filesystems and block devices. 
   HERE IT HANGS

Smart data sdb/sdx_crypt6

[I-KNOW-YOU log]# smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.23-1-MANJARO] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST10000VN0004-1ZD101
Serial Number:    ZA20SV9S
LU WWN Device Id: 5 000c50 09396796e
Firmware Version: SC60
User Capacity:    10.000.831.348.736 bytes [10,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Feb 25 00:35:55 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  584) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 881) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x50bd)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail Always       -       190984434
  3 Spin_Up_Time            0x0003   088   086   000    Pre-fail Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age Always       -       1689
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail Always       -       0
  7 Seek_Error_Rate         0x000f   091   060   045    Pre-fail Always       -       1383480407
  9 Power_On_Hours          0x0032   088   088   000    Old_age Always       -       10784 (194 247 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   020    Old_age Always       -       1028
184 End-to-End_Error        0x0032   100   100   099    Old_age Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age Always       -       0
188 Command_Timeout         0x0032   100   096   000    Old_age Always       -       2851901801115
189 High_Fly_Writes         0x003a   085   085   000    Old_age Always       -       15
190 Airflow_Temperature_Cel 0x0022   067   045   040    Old_age Always       -       33 (Min/Max 18/39)
191 G-Sense_Error_Rate      0x0032   099   099   000    Old_age Always       -       3678
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age Always       -       76
193 Load_Cycle_Count        0x0032   082   082   000    Old_age Always       -       37441
194 Temperature_Celsius     0x0022   033   055   000    Old_age Always       -       33 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   011   001   000    Old_age Always       -       190984434
197 Current_Pending_Sector  0x0012   100   100   000    Old_age Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age Offline      -       5122 (129 181 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age Offline      -       56499736146
242 Total_LBAs_Read         0x0000   100   253   000    Old_age Offline      -       254319596354

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay. 

Please this is so damn annoying. Any data I can collect to help get to the bottom of this? Why is the kernel not even able to reboot?

@CySlider
Copy link

CySlider commented Mar 9, 2019

Jonathon from the Manjaro Team had me check the i/o scheduler

I did and it turned out, that while zfs did set the i/o scheduler of the dm cryptsetup devices to none (I'm using luks encrypted disks) it did not manage to change the scheduler of the underlying physical sdx disks to none as well. They were set to mq-deadline instead

I now changed all of them to none as well and until now the problem did not reappear. It is far from long enough to be sure, but it is definitely already a notable time that passed and my hopes are up.

So if you have this issue and you use LUKS devices, check that the io scheduler is set to none on both layers and if not change it.

Update: can also be "noop" on your system

@denizzzka
Copy link

denizzzka commented Mar 9, 2019

(I'm using luks encrypted disks)

Hmmm, me too! Is anybody here who NOT used LUKS but faced with this issue?

@CySlider you use ZFS pool on LUKS or LUKS vdev on ZFS?

@denizzzka
Copy link

check that the io scheduler is set to none

Scheduler for HDDs is called "noop", not "none". Maybe this is important.

(So, I changed it to "noop" and stay fingers crossed)

@CySlider
Copy link

CySlider commented Mar 9, 2019

On my system it is "none" but you are right it might also be called "noop". You can check the available modes with:
cat /sys/block/THE_DEVICE/queue/scheduler

@bglod
Copy link

bglod commented Mar 9, 2019

@denizzzka I am NOT using LUKS, but I am using ZFS native encryption.

The scheduler for all (12) devices is set to mq-deadline. I will set these to none and see what happens!

@prgwiz
Copy link

prgwiz commented Sep 24, 2019 via email

@Ninpo
Copy link

Ninpo commented Oct 12, 2019

I've just had this with 0.8.2 on 5.2.16 kernel. I was running a world update on Gentoo when the process stopped doing anything and kworker/u256:3+flush-zfs-1 was sat using 100% of a core. ranlib and emerge were the two deadlocking processes being chucked into dmesg. All zfs system except for the fat32 uefi partition, rpool is a mirror on nvme, (where the hang happened), misc is a 3 rust disk raid-z with ssd as l2arc/zil.

@denizzzka
Copy link

It seems to me that this is somehow related to the swap. When it is become heavy used, but not all disks have the same IO speed, this happens.

@Ninpo
Copy link

Ninpo commented Oct 12, 2019

I don't have swap enabled.

@bud4
Copy link

bud4 commented Oct 14, 2019

I would suggest that you provide scripts to use in these cases. so that it is possible to gather all the information you need.

@Ninpo
Copy link

Ninpo commented Oct 15, 2019

Does anyone suffering this issue make use of FUSE or AppImages mounted via FUSE? In #9430 we've found two of us have that in common.

@ReimuHakurei
Copy link

Additionally, what processors are you all using? In #9430 we also found us both using AMD processors to be in common.

@Anteru
Copy link

Anteru commented Oct 15, 2019

I'm not using FUSE or AppImages (unless Ubuntu server uses FUSE by default for something). I've seen this on EPYC 7601 FWIW.

@bglod
Copy link

bglod commented Oct 15, 2019

Does anyone suffering this issue make use of FUSE or AppImages mounted via FUSE? In #9430 we've found two of us have that in common.

No to both.

Additionally, what processors are you all using? In #9430 we also found us both using AMD processors to be in common.

System has two processors, both are Intel Xeon E5-2660 @ 2.20GHz.

@denizzzka
Copy link

denizzzka commented Oct 31, 2019

@Ninpo

I don't have swap enabled.

Ok, but maybe you (same as me) use pool with different HDDs where not all disks have the same IO speed? I use one slow "green" WD drive and 2 another more faster HDDs.

If I switch slow partition to "offline" this hangs will become less frequent.

@Anteru
Copy link

Anteru commented Oct 31, 2019

At least my HDDs are all the same, and some people experienced this will all NVMe setup as well.

@Ninpo
Copy link

Ninpo commented Nov 2, 2019

On one of my pools the SSDs being used for ZIL/metadata caching are different drives but the sync problem is happens on my main rpool zpool which has identical NVMe drives in.

@camaer
Copy link

camaer commented Nov 20, 2019

We are using Dell R720 with 2x Xeon E5-2670 and 512GB Ram. The HBA is an LSI 9207 with 4 KINGSTON SEDC500M960G in stripped mirror configuration. No L2ARC, Slog and no swap either. The problem happens when rolling back LXC container hosting Postgres DBs which use their own dataset for their disk.

zfs send is also used often on the ZFS dataset that is being rollbacked. Could it be a race condition ?

ZFS 0.8.2

@camaer
Copy link

camaer commented Nov 20, 2019

@kpande It's the same issue as in this thread. I posted my specs to maybe help pinpoint some similarities in the hardware between everyone experiencing this issue.

@jimmyfi
Copy link

jimmyfi commented Apr 25, 2020

I had the same issue but when I disabled cache and log and we set synchronization to standard after long debugging the problem went away. (cache and log were both SSD drives)

@denizzzka
Copy link

I moved swap into another ssd drive and problem away
(I know what it is not recommentded to store swap at ZFS for now)

@bud4
Copy link

bud4 commented Jul 9, 2020

once again in production. the situation is truly critical. zfs version 0.8.1-1ubuntu14.4
Linux zfs-sch 5.3.0-62-generic #56~18.04.1-Ubuntu SMP

sync stack :
[<0>] dmu_tx_delay.isra.8+0x138/0x140 [zfs]
[<0>] dmu_tx_wait+0xb1/0x210 [zfs]
[<0>] dmu_tx_assign+0x49/0x70 [zfs]
[<0>] dmu_sync_late_arrival+0x53/0x160 [zfs]
[<0>] dmu_sync+0x1de/0x4e0 [zfs]
[<0>] zfs_get_data+0x1a8/0x350 [zfs]
[<0>] zil_lwb_commit+0x1cd/0x360 [zfs]
[<0>] zil_process_commit_list+0xea/0x1f0 [zfs]
[<0>] zil_commit_writer.isra.14+0xa1/0xb0 [zfs]
[<0>] zil_commit_impl+0x57/0x90 [zfs]
[<0>] zil_commit+0x3d/0x60 [zfs]
[<0>] zfs_sync+0x6d/0xd0 [zfs]
[<0>] zpl_sync_fs+0x46/0x70 [zfs]
[<0>] sync_fs_one_sb+0x23/0x30
[<0>] iterate_supers+0xa2/0x110
[<0>] ksys_sync+0x62/0xb0
[<0>] __ia32_sys_sync+0xe/0x20
[<0>] do_syscall_64+0x5a/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Proces in state D
29378 931 nobody D 4.8 smbd dmu_tx_delay.isra.8
2 3051 root D 1.0 txg_sync cv_timedwait_common
29378 4574 s.paoli+ D 1.3 smbd dmu_tx_delay.isra.8
29378 5462 nobody D 4.1 smbd cv_timedwait_common
29378 6407 nobody D 1.2 smbd dmu_tx_delay.isra.8
29378 9449 nobody D 2.6 smbd dmu_tx_delay.isra.8
29378 11204 philips D 3.7 smbd dmu_tx_delay.isra.8
29378 12025 nobody D 2.2 smbd dmu_tx_delay.isra.8
1 17148 nobody D 4.6 smbd dmu_tx_delay.isra.8
17769 23245 root D+ 3.3 sync dmu_tx_delay.isra.8

dmes

[ 243.613937] INFO: task zpool:6002 blocked for more than 120 seconds.
[ 243.613980] Tainted: P O 5.3.0-62-generic #5618.04.1-Ubuntu
[ 243.614028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.614079] zpool D 0 6002 1 0x00004002
[ 243.614081] Call Trace:
[ 243.614085] __schedule+0x2a8/0x670
[ 243.614089] ? autoremove_wake_function+0x12/0x40
[ 243.614092] schedule+0x33/0xa0
[ 243.614094] io_schedule+0x16/0x40
[ 243.614102] cv_wait_common+0xdb/0x160 [spl]
[ 243.614105] ? wait_woken+0x80/0x80
[ 243.614112] __cv_wait_io+0x18/0x20 [spl]
[ 243.614223] txg_wait_synced+0x88/0xd0 [zfs]
[ 243.614332] spa_config_update+0x117/0x160 [zfs]
[ 243.614440] spa_import+0x645/0x7b0 [zfs]
[ 243.614553] zfs_ioc_pool_import+0x13c/0x150 [zfs]
[ 243.614667] zfsdev_ioctl+0x584/0x620 [zfs]
[ 243.614671] do_vfs_ioctl+0xa9/0x640
[ 243.614677] ? handle_mm_fault+0xcb/0x210
[ 243.614680] ksys_ioctl+0x75/0x80
[ 243.614683] __x64_sys_ioctl+0x1a/0x20
[ 243.614688] do_syscall_64+0x5a/0x130
[ 243.614692] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 243.614694] RIP: 0033:0x7f19f61826d7
[ 243.614702] Code: Bad RIP value.
[ 243.614704] RSP: 002b:00007fff9cf6bec8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 243.614707] RAX: ffffffffffffffda RBX: 00007fff9cf6f500 RCX: 00007f19f61826d7
[ 243.614708] RDX: 00007fff9cf6f500 RSI: 0000000000005a02 RDI: 0000000000000003
[ 243.614709] RBP: 0000000000005a02 R08: 0000000000005a02 R09: 0000000000000000
[ 243.614711] R10: 0000000000000022 R11: 0000000000000246 R12: 00005616bdb13660
[ 243.614712] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000008
[ 364.445890] INFO: task l2arc_feed:2110 blocked for more than 241 seconds.
[ 364.445947] Tainted: P O 5.3.0-62-generic #56
18.04.1-Ubuntu
[ 364.445996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 364.446048] l2arc_feed D 0 2110 2 0x80004000
[ 364.446051] Call Trace:
[ 364.446061] __schedule+0x2a8/0x670
[ 364.446067] schedule+0x33/0xa0
[ 364.446069] schedule_preempt_disabled+0xe/0x10
[ 364.446073] __mutex_lock.isra.9+0x26d/0x4e0
[ 364.446187] ? l2arc_evict+0x290/0x290 [zfs]
[ 364.446191] __mutex_lock_slowpath+0x13/0x20
[ 364.446193] ? __mutex_lock_slowpath+0x13/0x20
[ 364.446196] mutex_lock+0x2f/0x40
[ 364.446258] l2arc_dev_get_next+0x1a/0x1e0 [zfs]
[ 364.446320] ? l2arc_evict+0x290/0x290 [zfs]
[ 364.446382] l2arc_feed_thread+0xd6/0x250 [zfs]
[ 364.446443] ? l2arc_evict+0x290/0x290 [zfs]
[ 364.446458] thread_generic_wrapper+0x74/0x90 [spl]
[ 364.446464] kthread+0x121/0x140
[ 364.446474] ? __thread_exit+0x20/0x20 [spl]
[ 364.446476] ? kthread_park+0xb0/0xb0
[ 364.446481] ret_from_fork+0x35/0x40
[ 606.109902] INFO: task spl_system_task:1487 blocked for more than 120 seconds.
[ 606.109959] Tainted: P O 5.3.0-62-generic #5618.04.1-Ubuntu
[ 606.110008] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 606.110059] spl_system_task D 0 1487 2 0x80004000
[ 606.110063] Call Trace:
[ 606.110072] __schedule+0x2a8/0x670
[ 606.110077] schedule+0x33/0xa0
[ 606.110080] schedule_preempt_disabled+0xe/0x10
[ 606.110083] __mutex_lock.isra.9+0x26d/0x4e0
[ 606.110090] ? update_load_avg+0x17d/0x680
[ 606.110092] ? update_load_avg+0x17d/0x680
[ 606.110095] __mutex_lock_slowpath+0x13/0x20
[ 606.110098] ? __mutex_lock_slowpath+0x13/0x20
[ 606.110100] mutex_lock+0x2f/0x40
[ 606.110256] spa_open_common+0x68/0x500 [zfs]
[ 606.110261] ? __switch_to_asm+0x40/0x70
[ 606.110264] ? __switch_to_asm+0x34/0x70
[ 606.110267] ? __switch_to_asm+0x40/0x70
[ 606.110269] ? __switch_to_asm+0x34/0x70
[ 606.110382] spa_open+0x13/0x20 [zfs]
[ 606.110481] dsl_pool_hold+0x33/0x80 [zfs]
[ 606.110562] dmu_objset_own+0x46/0x190 [zfs]
[ 606.110678] zvol_prefetch_minors_impl+0x4b/0xa0 [zfs]
[ 606.110694] taskq_thread+0x210/0x400 [spl]
[ 606.110698] ? __switch_to_asm+0x40/0x70
[ 606.110704] ? wake_up_q+0x80/0x80
[ 606.110710] kthread+0x121/0x140
[ 606.110720] ? task_done+0x90/0x90 [spl]
[ 606.110722] ? kthread_park+0xb0/0xb0
[ 606.110726] ret_from_fork+0x35/0x40
[ 606.110733] INFO: task l2arc_feed:2110 blocked for more than 120 seconds.
[ 606.110779] Tainted: P O 5.3.0-62-generic #56
18.04.1-Ubuntu
[ 606.110828] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 606.110878] l2arc_feed D 0 2110 2 0x80004000
[ 606.110881] Call Trace:
[ 606.110884] __schedule+0x2a8/0x670
[ 606.110888] schedule+0x33/0xa0
[ 606.110890] schedule_preempt_disabled+0xe/0x10
[ 606.110893] __mutex_lock.isra.9+0x26d/0x4e0
[ 606.110896] __mutex_lock_slowpath+0x13/0x20
[ 606.110899] ? __mutex_lock_slowpath+0x13/0x20
[ 606.110901] mutex_lock+0x2f/0x40
[ 606.110964] l2arc_dev_get_next+0x1a/0x1e0 [zfs]
[ 606.111026] l2arc_feed_thread+0xd6/0x250 [zfs]
[ 606.111088] ? l2arc_evict+0x290/0x290 [zfs]
[ 606.111100] thread_generic_wrapper+0x74/0x90 [spl]
[ 606.111103] kthread+0x121/0x140
[ 606.111114] ? __thread_exit+0x20/0x20 [spl]
[ 606.111117] ? kthread_park+0xb0/0xb0
[ 606.111121] ret_from_fork+0x35/0x40
[ 606.111129] INFO: task zpool:6002 blocked for more than 120 seconds.
[ 606.111174] Tainted: P O 5.3.0-62-generic #5618.04.1-Ubuntu
[ 606.111223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 606.111275] zpool D 0 6002 1 0x00000000
[ 606.111278] Call Trace:
[ 606.111281] __schedule+0x2a8/0x670
[ 606.111285] schedule+0x33/0xa0
[ 606.111287] schedule_preempt_disabled+0xe/0x10
[ 606.111290] __mutex_lock.isra.9+0x26d/0x4e0
[ 606.111294] __mutex_lock_slowpath+0x13/0x20
[ 606.111296] ? __mutex_lock_slowpath+0x13/0x20
[ 606.111298] mutex_lock+0x2f/0x40
[ 606.111413] spa_open_common+0x68/0x500 [zfs]
[ 606.111527] spa_get_stats+0x4d/0x300 [zfs]
[ 606.111530] ? _cond_resched+0x19/0x40
[ 606.111651] zfs_ioc_pool_stats+0x39/0x90 [zfs]
[ 606.111773] zfsdev_ioctl+0x584/0x620 [zfs]
[ 606.111777] do_vfs_ioctl+0xa9/0x640
[ 606.111784] ? handle_mm_fault+0xcb/0x210
[ 606.111786] ksys_ioctl+0x75/0x80
[ 606.111789] __x64_sys_ioctl+0x1a/0x20
[ 606.111795] do_syscall_64+0x5a/0x130
[ 606.111799] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 606.111802] RIP: 0033:0x7f19f61826d7
[ 606.111809] Code: Bad RIP value.
[ 606.111810] RSP: 002b:00007fff9cf6be48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 606.111813] RAX: ffffffffffffffda RBX: 00007fff9cf6be70 RCX: 00007f19f61826d7
[ 606.111815] RDX: 00007fff9cf6be70 RSI: 0000000000005a05 RDI: 0000000000000003
[ 606.111816] RBP: 00005616bdbb95b0 R08: 00005616bdbb9700 R09: 0000000000000000
[ 606.111817] R10: 00005616bdb12010 R11: 0000000000000246 R12: 00005616bdb13660
[ 606.111819] R13: 0000000000000000 R14: 00007fff9cf6f464 R15: 0000000000000008
[ 606.111832] INFO: task z_zvol:7381 blocked for more than 120 seconds.
[ 606.111876] Tainted: P O 5.3.0-62-generic #56
18.04.1-Ubuntu
[ 606.111924] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 606.111975] z_zvol D 0 7381 2 0x80004000
[ 606.111977] Call Trace:
[ 606.111981] __schedule+0x2a8/0x670
[ 606.111984] schedule+0x33/0xa0
[ 606.111986] schedule_preempt_disabled+0xe/0x10
[ 606.111989] __mutex_lock.isra.9+0x26d/0x4e0
[ 606.112057] ? do_raw_spin_unlock+0x9/0x10 [zfs]
[ 606.112124] ? dbuf_rele_and_unlock+0x17d/0x4f0 [zfs]
[ 606.112132] ? cv_destroy_wakeup+0x25/0x40 [spl]
[ 606.112136] __mutex_lock_slowpath+0x13/0x20
[ 606.112138] ? __mutex_lock_slowpath+0x13/0x20
[ 606.112140] mutex_lock+0x2f/0x40
[ 606.112252] spa_open_common+0x68/0x500 [zfs]
[ 606.112364] spa_open+0x13/0x20 [zfs]
[ 606.112459] dsl_pool_hold+0x33/0x80 [zfs]
[ 606.112538] dmu_objset_hold_flags+0x3b/0xc0 [zfs]
[ 606.112616] dmu_objset_hold+0x16/0x20 [zfs]
[ 606.112713] dsl_prop_get+0x44/0xb0 [zfs]
[ 606.112831] ? list_insert_tail+0x20/0x20 [zfs]
[ 606.112927] dsl_prop_get_integer+0x1e/0x20 [zfs]
[ 606.113044] zvol_create_minors_cb+0x3b/0x100 [zfs]
[ 606.113152] ? rrw_exit+0x61/0xb0 [zfs]
[ 606.113229] dmu_objset_find_impl+0x107/0x410 [zfs]
[ 606.113349] ? list_insert_tail+0x20/0x20 [zfs]
[ 606.113426] dmu_objset_find_impl+0x211/0x410 [zfs]
[ 606.113545] ? list_insert_tail+0x20/0x20 [zfs]
[ 606.113622] dmu_objset_find_impl+0x211/0x410 [zfs]
[ 606.113740] ? list_insert_tail+0x20/0x20 [zfs]
[ 606.113818] dmu_objset_find+0x58/0x90 [zfs]
[ 606.113955] zvol_create_minors_impl+0x193/0x1b0 [zfs]
[ 606.114079] zvol_task_cb+0xc1/0xf0 [zfs]
[ 606.114099] taskq_thread+0x210/0x400 [spl]
[ 606.114108] ? __switch_to_asm+0x40/0x70
[ 606.114117] ? wake_up_q+0x80/0x80
[ 606.114125] kthread+0x121/0x140
[ 606.114141] ? task_done+0x90/0x90 [spl]
[ 606.114147] ? kthread_park+0xb0/0xb0
[ 606.114155] ret_from_fork+0x35/0x40
[ 606.114214] INFO: task spa_async:12502 blocked for more than 120 seconds.
[ 606.114263] Tainted: P O 5.3.0-62-generic #5618.04.1-Ubuntu
[ 606.114313] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 606.114365] spa_async D 0 12502 2 0x80004000
[ 606.114368] Call Trace:
[ 606.114372] __schedule+0x2a8/0x670
[ 606.114375] schedule+0x33/0xa0
[ 606.114384] cv_wait_common+0x14c/0x160 [spl]
[ 606.114388] ? wait_woken+0x80/0x80
[ 606.114397] __cv_wait+0x15/0x20 [spl]
[ 606.114515] spa_config_enter+0x7c/0x110 [zfs]
[ 606.114633] spa_config_update+0x34/0x160 [zfs]
[ 606.114752] spa_async_thread+0x453/0x530 [zfs]
[ 606.114873] ? spa_async_autoexpand+0x90/0x90 [zfs]
[ 606.114890] thread_generic_wrapper+0x74/0x90 [spl]
[ 606.114898] kthread+0x121/0x140
[ 606.114914] ? __thread_exit+0x20/0x20 [spl]
[ 606.114917] ? kthread_park+0xb0/0xb0
[ 606.114921] ret_from_fork+0x35/0x40
[ 726.942290] INFO: task spl_system_task:1487 blocked for more than 241 seconds.
[ 726.942349] Tainted: P O 5.3.0-62-generic #56
18.04.1-Ubuntu
[ 726.942398] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 726.942450] spl_system_task D 0 1487 2 0x80004000
[ 726.942454] Call Trace:
[ 726.942464] __schedule+0x2a8/0x670
[ 726.942469] schedule+0x33/0xa0
[ 726.942472] schedule_preempt_disabled+0xe/0x10
[ 726.942475] __mutex_lock.isra.9+0x26d/0x4e0
[ 726.942481] ? update_load_avg+0x17d/0x680
[ 726.942484] ? update_load_avg+0x17d/0x680
[ 726.942487] __mutex_lock_slowpath+0x13/0x20
[ 726.942489] ? __mutex_lock_slowpath+0x13/0x20
[ 726.942491] mutex_lock+0x2f/0x40
[ 726.942647] spa_open_common+0x68/0x500 [zfs]
[ 726.942652] ? __switch_to_asm+0x40/0x70
[ 726.942655] ? __switch_to_asm+0x34/0x70
[ 726.942657] ? __switch_to_asm+0x40/0x70
[ 726.942660] ? __switch_to_asm+0x34/0x70
[ 726.942778] spa_open+0x13/0x20 [zfs]
[ 726.942904] dsl_pool_hold+0x33/0x80 [zfs]
[ 726.942994] dmu_objset_own+0x46/0x190 [zfs]
[ 726.943104] zvol_prefetch_minors_impl+0x4b/0xa0 [zfs]
[ 726.943122] taskq_thread+0x210/0x400 [spl]
[ 726.943126] ? __switch_to_asm+0x40/0x70
[ 726.943131] ? wake_up_q+0x80/0x80
[ 726.943136] kthread+0x121/0x140
[ 726.943147] ? task_done+0x90/0x90 [spl]
[ 726.943150] ? kthread_park+0xb0/0xb0
[ 726.943153] ret_from_fork+0x35/0x40
[ 726.943160] INFO: task l2arc_feed:2110 blocked for more than 241 seconds.
[ 726.943209] Tainted: P O 5.3.0-62-generic #56~18.04.1-Ubuntu
[ 726.943259] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 726.943311] l2arc_feed D 0 2110 2 0x80004000
[ 726.943314] Call Trace:
[ 726.943317] __schedule+0x2a8/0x670
[ 726.943321] schedule+0x33/0xa0
[ 726.943323] schedule_preempt_disabled+0xe/0x10
[ 726.943326] __mutex_lock.isra.9+0x26d/0x4e0
[ 726.943330] __mutex_lock_slowpath+0x13/0x20
[ 726.943332] ? __mutex_lock_slowpath+0x13/0x20
[ 726.943335] mutex_lock+0x2f/0x40
[ 726.943391] l2arc_dev_get_next+0x1a/0x1e0 [zfs]
[ 726.943449] l2arc_feed_thread+0xd6/0x250 [zfs]
[ 726.943506] ? l2arc_evict+0x290/0x290 [zfs]
[ 726.943519] thread_generic_wrapper+0x74/0x90 [spl]
[ 726.943522] kthread+0x121/0x140
[ 726.943534] ? __thread_exit+0x20/0x20 [spl]
[ 726.943536] ? kthread_park+0xb0/0xb0
[ 726.943540] ret_from_fork+0x35/0x40

update 👍
726.943160] INFO: task l2arc_feed:2110 blocked for more than 241 seconds.
[ 726.943209] Tainted: P O 5.3.0-62-generic #56~18.04.1-Ubuntu
[ 726.943259] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 726.943311] l2arc_feed D 0 2110 2 0x80004000
[ 726.943314] Call Trace:
[ 726.943317] __schedule+0x2a8/0x670
[ 726.943321] schedule+0x33/0xa0
[ 726.943323] schedule_preempt_disabled+0xe/0x10
[ 726.943326] __mutex_lock.isra.9+0x26d/0x4e0
[ 726.943330] __mutex_lock_slowpath+0x13/0x20
[ 726.943332] ? __mutex_lock_slowpath+0x13/0x20
[ 726.943335] mutex_lock+0x2f/0x40
[ 726.943391] l2arc_dev_get_next+0x1a/0x1e0 [zfs]
[ 726.943449] l2arc_feed_thread+0xd6/0x250 [zfs]
[ 726.943506] ? l2arc_evict+0x290/0x290 [zfs]
[ 726.943519] thread_generic_wrapper+0x74/0x90 [spl]
[ 726.943522] kthread+0x121/0x140
[ 726.943534] ? __thread_exit+0x20/0x20 [spl]
[ 726.943536] ? kthread_park+0xb0/0xb0
[ 726.943540] ret_from_fork+0x35/0x40
[ 6928.956809] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79750

snajpa added a commit to vpsfreecz/zfs that referenced this issue Sep 3, 2020
There are three locks at play, when handling ZIL records. First is a
zl_issuer_lock, that is supposed to be held, whenever we're issuing
a new record to ZIL. Then there is zl_lock to actually protect the in
the zilog struct. And then there's a last lock, zcw_lock, which is
used to protect the lists of ZIL call-back waiters.

The order of locking is supposed to be:

zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage
in zil_alloc_lwb and zil_commit_waiter_timeout functions.

Function zil_commit_waiter_link_lwb goes against this, it is expecting
to be entered with zl_lock already held and only then it is taking
the zcw_lock;

This patch straightens the locking in zil_commit_waiter_link_lwb to
take the zl_lock on its own, correcting the locking order.

Following is an attached messsage from the Linux lockdep mechanism
describing the potential deadlock - in our production, it was a matter
of a few hours to hit this one...

(redacted to fit 72 cols)

systemd-journal/5561 is trying to acquire lock:
(&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0

but task is already holding lock:
(&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&zcw->zcw_lock){+.+.}-{4:4}:
       __mutex_lock+0xac/0x9e0
       mutex_lock_nested+0x1b/0x20
       zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs]
       zil_commit_impl+0x12b0/0x1850 [zfs]
       zil_commit+0x43/0x60 [zfs]
       zpl_writepages+0xf8/0x1a0 [zfs]
       do_writepages+0x43/0xf0
       __filemap_fdatawrite_range+0xd5/0x110
       filemap_write_and_wait_range+0x4b/0xb0
       zpl_fsync+0x4d/0xb0 [zfs]
       vfs_fsync_range+0x49/0x80
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x14/0x20
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&zilog->zl_lock){+.+.}-{4:4}:
       __lock_acquire+0x12b6/0x2480
       lock_acquire+0xab/0x380
       __mutex_lock+0xac/0x9e0
       mutex_lock_nested+0x1b/0x20
       zil_alloc_lwb+0x1df/0x3e0 [zfs]
       zil_lwb_write_issue+0x265/0x3f0 [zfs]
       zil_commit_impl+0x577/0x1850 [zfs]
       zil_commit+0x43/0x60 [zfs]
       zpl_writepages+0xf8/0x1a0 [zfs]
       do_writepages+0x43/0xf0
       __filemap_fdatawrite_range+0xd5/0x110
       filemap_write_and_wait_range+0x4b/0xb0
       zpl_fsync+0x4d/0xb0 [zfs]
       vfs_fsync_range+0x49/0x80
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x14/0x20
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&zcw->zcw_lock);
                               lock(&zilog->zl_lock);
                               lock(&zcw->zcw_lock);
  lock(&zilog->zl_lock);

 *** DEADLOCK ***

2 locks held by systemd-journal/5561:
 #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc...
 #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850

Fixes openzfs#7038
Fixes openzfs#10440

Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
@snajpa snajpa mentioned this issue Sep 3, 2020
12 tasks
snajpa added a commit to vpsfreecz/zfs that referenced this issue Sep 3, 2020
There are three locks at play, when handling ZIL records. First is a
zl_issuer_lock, that is supposed to be held, whenever we're issuing
a new record to ZIL. Then there is zl_lock to actually protect the in
the zilog struct. And then there's a last lock, zcw_lock, which is
used to protect the lists of ZIL call-back waiters.

The order of locking is supposed to be:

zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage
in zil_alloc_lwb and zil_commit_waiter_timeout functions.

Function zil_commit_waiter_link_lwb goes against this, it is expecting
to be entered with zl_lock already held and only then it is taking
the zcw_lock;

This patch straightens the locking in zil_commit_waiter_link_lwb to
take the zl_lock on its own, correcting the locking order.

Following is an attached messsage from the Linux lockdep mechanism
describing the potential deadlock - in our production, it was a matter
of a few hours to hit this one...

(redacted to fit 72 cols)

systemd-journal/5561 is trying to acquire lock:
(&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0

but task is already holding lock:
(&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&zcw->zcw_lock){+.+.}-{4:4}:
       __mutex_lock+0xac/0x9e0
       mutex_lock_nested+0x1b/0x20
       zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs]
       zil_commit_impl+0x12b0/0x1850 [zfs]
       zil_commit+0x43/0x60 [zfs]
       zpl_writepages+0xf8/0x1a0 [zfs]
       do_writepages+0x43/0xf0
       __filemap_fdatawrite_range+0xd5/0x110
       filemap_write_and_wait_range+0x4b/0xb0
       zpl_fsync+0x4d/0xb0 [zfs]
       vfs_fsync_range+0x49/0x80
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x14/0x20
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&zilog->zl_lock){+.+.}-{4:4}:
       __lock_acquire+0x12b6/0x2480
       lock_acquire+0xab/0x380
       __mutex_lock+0xac/0x9e0
       mutex_lock_nested+0x1b/0x20
       zil_alloc_lwb+0x1df/0x3e0 [zfs]
       zil_lwb_write_issue+0x265/0x3f0 [zfs]
       zil_commit_impl+0x577/0x1850 [zfs]
       zil_commit+0x43/0x60 [zfs]
       zpl_writepages+0xf8/0x1a0 [zfs]
       do_writepages+0x43/0xf0
       __filemap_fdatawrite_range+0xd5/0x110
       filemap_write_and_wait_range+0x4b/0xb0
       zpl_fsync+0x4d/0xb0 [zfs]
       vfs_fsync_range+0x49/0x80
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x14/0x20
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&zcw->zcw_lock);
                               lock(&zilog->zl_lock);
                               lock(&zcw->zcw_lock);
  lock(&zilog->zl_lock);

 *** DEADLOCK ***

2 locks held by systemd-journal/5561:
 #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc...
 #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850

Fixes openzfs#7038
Fixes openzfs#10440

Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
snajpa added a commit to vpsfreecz/zfs that referenced this issue Sep 3, 2020
There are three locks at play, when handling ZIL records. First is a
zl_issuer_lock, that is supposed to be held, whenever we're issuing
a new record to ZIL. Then there is zl_lock to actually protect the in
the zilog struct. And then there's a last lock, zcw_lock, which is
used to protect the lists of ZIL call-back waiters.

The order of locking is supposed to be:

zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage
in zil_alloc_lwb and zil_commit_waiter_timeout functions.

Function zil_commit_waiter_link_lwb goes against this, it is expecting
to be entered with zl_lock already held and only then it is taking
the zcw_lock;

This patch straightens the locking in zil_commit_waiter_link_lwb to
take the zl_lock on its own, correcting the locking order.

Following is an attached messsage from the Linux lockdep mechanism
describing the potential deadlock - in our production, it was a matter
of a few hours to hit this one...

(redacted to fit 72 cols)

systemd-journal/5561 is trying to acquire lock:
(&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0

but task is already holding lock:
(&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&zcw->zcw_lock){+.+.}-{4:4}:
       __mutex_lock+0xac/0x9e0
       mutex_lock_nested+0x1b/0x20
       zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs]
       zil_commit_impl+0x12b0/0x1850 [zfs]
       zil_commit+0x43/0x60 [zfs]
       zpl_writepages+0xf8/0x1a0 [zfs]
       do_writepages+0x43/0xf0
       __filemap_fdatawrite_range+0xd5/0x110
       filemap_write_and_wait_range+0x4b/0xb0
       zpl_fsync+0x4d/0xb0 [zfs]
       vfs_fsync_range+0x49/0x80
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x14/0x20
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&zilog->zl_lock){+.+.}-{4:4}:
       __lock_acquire+0x12b6/0x2480
       lock_acquire+0xab/0x380
       __mutex_lock+0xac/0x9e0
       mutex_lock_nested+0x1b/0x20
       zil_alloc_lwb+0x1df/0x3e0 [zfs]
       zil_lwb_write_issue+0x265/0x3f0 [zfs]
       zil_commit_impl+0x577/0x1850 [zfs]
       zil_commit+0x43/0x60 [zfs]
       zpl_writepages+0xf8/0x1a0 [zfs]
       do_writepages+0x43/0xf0
       __filemap_fdatawrite_range+0xd5/0x110
       filemap_write_and_wait_range+0x4b/0xb0
       zpl_fsync+0x4d/0xb0 [zfs]
       vfs_fsync_range+0x49/0x80
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x14/0x20
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&zcw->zcw_lock);
                               lock(&zilog->zl_lock);
                               lock(&zcw->zcw_lock);
  lock(&zilog->zl_lock);

 *** DEADLOCK ***

2 locks held by systemd-journal/5561:
 #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc...
 #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850

Fixes openzfs#7038
Fixes openzfs#10440

Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
@chadjoan
Copy link

chadjoan commented Mar 7, 2021

CONGRATULATIONS!

You fixed THE BUG.

I was able to confirm that the bug was exterminated on my relevant system as of (or before) commit 35ac0ed on 2021-01-26.

I did this by reproducing the kworker 100% hang just before updating ZFS. Then I attempted to reproduce it after updating ZFS, and was unable to. The affected system had 56 threads and could easily trigger this bug, assuming one has an hour or so to wait around for it. So the certainty is quite high.

For methodology, here's a little story, with screenshots:
https://imgur.com/a/8UHLtni

That should provide some (hopefully replicable) experimental evidence of this bug's demise, which might give the internet a bit more closure.

More importantly, I wanted to thank you for doing this. Thank you!

I am grateful and ricing at full power with no fear :)

@Ninpo
Copy link

Ninpo commented Mar 7, 2021

Hurrah! Wonder when it'll end up in a stable release. I noticed portage was a good way to trigger it too.

@chadjoan
Copy link

chadjoan commented Mar 8, 2021

In case it helps, I just updated my ebuild repo, so anyone (using Gentoo) who wants the working version I am using can skip some fidgety ebuild-writing steps by using these:
https://github.com/chadjoan/server-portage-overlay/tree/master/sys-fs

Given that the other contents of the repo probably aren't of interest to most people, you'd probably be best served by just copy-pasting those folders or ebuilds into your own local overlay, ex: /usr/local/portage on my system results in /usr/local/portage/sys-fs/zfs and /usr/local/portage/sys-fs/zfs-kmod (this requires a properly configured local overlay, of course).

ZFS has continuous integration and the devs seem to be very thoughtful in their work, so I feel pretty safe just using any commit that passes most-or-all CI tests, though I do read some of the preceding commits just to spot-check for anything alarming. So far I haven't seen any commits that I wouldn't want. IMO totally worth it, especially when the alternative involves enduring a bug like this one.

Of course, when working for a business, I could understand still wanting to wait for an official release due to political/CYA reasons; but if there's some bug like this that actively impedes resource utilization or productivity, then one might be able to negotiate for a practical exception to any of those pesky bureaucratic rules and expectations :3

@stale
Copy link

stale bot commented Mar 9, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Mar 9, 2022
@behlendorf
Copy link
Contributor

Closing. Fixed in 2.0 and later releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.