Crash on scrub #8664

marker5a · 2019-04-24T01:06:18Z

System information

Type	Version/Name
Distribution Name	Arch Linux
Distribution Version	-
Linux Kernel	5.0.9-1-ck
Architecture	x86_64
ZFS Version	0.8.0-rc4
SPL Version	0.8.0-rc4

Describe the problem you're observing

I have a ZFS pool running as my root filesystem, and upon starting a scrub on the pool, the system gets a kernel panic and the tty freezes up.

Describe how to reproduce the problem

Execute zfs scrub on the pool

Include any warning/errors/backtraces from the system logs

pool: zroot
state: ONLINE
scan: scrub paused since Tue Apr 23 20:09:21 2019
scrub started on Wed Feb 20 22:25:20 2019
0B scanned, 0B issued, 1.80T total
0B repaired, 0.00% done
config:

NAME                              STATE     READ WRITE CKSUM
zroot                             ONLINE       0     0     0
  raidz2-0                        ONLINE       0     0     0
    wwn-0x5000c500a5537122-part2  ONLINE       0     0     0
    wwn-0x5000c5006615a81c-part1  ONLINE       0     0     0
    wwn-0x5000c5007452f435        ONLINE       0     0     0
    wwn-0x50014ee25f72f76b        ONLINE       0     0     0

I'm trying to get a proper log, but wanted to at least get a picture that I was able to take... the system is not currently setup to save kernel panic info to file, so once I can get more info on that, I will post.

The text was updated successfully, but these errors were encountered:

behlendorf · 2019-04-24T22:54:14Z

I can't be entirely certain based on the screenshot. But I believe this is due to a change in the latest 5.0 kernel related to cgroups. I'd suggest rolling back your kernel to 4.19 while we sort this out.

marker5a · 2019-04-26T01:34:15Z

Still seeing similar behavior on 4.18 (tried 4.19 as well), the only difference now is that dmesg isn't spitting anything out.

I'll go back to 5.0 kernel and try and get better logs, but is there anything in particular of good use to look at? My knowledge on these subjects is not even a fraction of you guys/gals so please bear with me!

marker5a · 2019-04-26T01:46:53Z

I should also point out that booting into another to system and accessing this pool as a non-root partition works fine. I need to double check which kernel was running in that test... Will try and get that asap. Also, scrubbing a different pool from the same 5.0 kernel works fine too. It's weird that it seems to just be this combo

behlendorf · 2019-04-27T16:34:18Z

Then are you saying everything appears to work fine using the 5.0.9-1-ck kernel on Arch Linux as long as you're not using it as a root pool? That would be strange.

is there anything in particular of good use to look at

If possible it would be great to get the first stack dump from the console logs.

marker5a · 2019-05-01T00:37:50Z

Hey behlendorf

That seems to be the case that I have... I can scrub it fine with whatever kernel I throw at it, unless it's the root. I've been tempted to create a new pool from this dataset and see if the problem moves with the pool, but for the time being, this seems like a good test case for a potential issue... not sure though?

I was finally able to get more data from my setup using a serial cable to dump things out... I hope this helps further. I should've mentioned earlier... thanks for you (and all the developers) for all you do... I love the project! Thanks for the time at looking at this!


[ 489.388218]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.388221]  ? preempt_count_add+0x79/0xb0
[  489.388223]  schedule+0x32/0xb0
[  489.388228]  cv_wait_common+0xf9/0x120 [spl]
[  489.388231]  ? wait_woken+0x80/0x80
[  489.388319]  spa_config_enter+0xef/0x100 [zfs]
[  489.388370]  zio_vdev_io_start+0x2de/0x2f0 [zfs]
[  489.388376]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[  489.388425]  zio_nowait+0xc3/0x180 [zfs]
[  489.388457]  arc_read+0x8d6/0xfe0 [zfs]
[  489.388490]  ? dbuf_rele_and_unlock+0x620/0x620 [zfs]
[  489.388523]  dbuf_read+0x265/0xb40 [zfs]
[  489.388563]  dmu_tx_check_ioerr+0x61/0xb0 [zfs]
[  489.388602]  dmu_tx_count_write+0xe2/0x1a0 [zfs]
[  489.388641]  dmu_tx_hold_write+0x3c/0x50 [zfs]
[  489.388691]  zfs_putpage+0x1af/0x520 [zfs]
[  489.388694]  ? rmap_walk_file+0x123/0x260
[  489.388698]  ? percpu_counter_add_batch+0x84/0xb0
[  489.388746]  zpl_putpage+0x31/0x40 [zfs]
[  489.388750]  write_cache_pages+0x1c2/0x430
[  489.388798]  ? zpl_write_common_iovec+0xf0/0xf0 [zfs]
[  489.388800]  ? native_usergs_sysret64+0xf/0x10
[  489.388849]  zpl_writepages+0x8a/0x150 [zfs]
[  489.388851]  do_writepages+0x41/0xd0
[  489.388854]  ? lock_timer_base+0x67/0x80
[  489.388856]  ? try_to_del_timer_sync+0x4d/0x80
[  489.388858]  __writeback_single_inode+0x3d/0x3c0
[  489.388860]  ? del_timer_sync+0x35/0x40
[  489.388862]  writeback_sb_inodes+0x1e8/0x4c0
[  489.388865]  __writeback_inodes_wb+0x5d/0xb0
[  489.388867]  wb_writeback+0x28f/0x340
[  489.388870]  ? cpumask_next+0x16/0x20
[  489.388872]  wb_workfn+0x35d/0x430
[  489.388876]  process_one_work+0x1eb/0x410
[  489.388878]  worker_thread+0x2d/0x3d0
[  489.388881]  ? process_one_work+0x410/0x410
[  489.388883]  kthread+0x112/0x130
[  489.388885]  ? kthread_park+0x80/0x80
[  489.388887]  ret_from_fork+0x35/0x40
[  489.388908] INFO: task txg_sync:606 blocked for more than 120 seconds.
[  489.396078]       Tainted: P           OE     5.0.9-1-ck #1
[  489.402978] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.412320] txg_sync        D    0   606      2 0x80000000
[  489.412322] Call Trace:
[  489.412325]  ? __schedule+0x6d9/0xe60
[  489.412328]  ? preempt_count_add+0x79/0xb0
[  489.412330]  ? _raw_spin_lock+0x13/0x30
[  489.412331]  ? preempt_count_add+0x79/0xb0
[  489.412333]  schedule+0x32/0xb0
[  489.412338]  cv_wait_common+0xf9/0x120 [spl]
[  489.412340]  ? wait_woken+0x80/0x80
[  489.412391]  spa_config_enter+0xef/0x100 [zfs]
[  489.412440]  spa_sync+0x197/0xfb0 [zfs]
[  489.412443]  ? _raw_spin_lock+0x13/0x30
[  489.412445]  ? _raw_spin_unlock+0x16/0x30
[  489.412447]  ? preempt_count_add+0x79/0xb0
[  489.412448]  ? _raw_spin_lock+0x13/0x30
[  489.412449]  ? _raw_spin_unlock+0x16/0x30
[  489.412500]  ? spa_txg_history_set.part.2+0xbd/0xd0 [zfs]
[  489.412551]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[  489.412601]  txg_sync_thread+0x28f/0x460 [zfs]
[  489.412653]  ? txg_thread_exit.isra.4+0x60/0x60 [zfs]
[  489.412659]  ? __thread_exit+0x20/0x20 [spl]
[  489.412665]  thread_generic_wrapper+0x6f/0x80 [spl]
[  489.412668]  kthread+0x112/0x130
[  489.412669]  ? kthread_park+0x80/0x80
[  489.412671]  ret_from_fork+0x35/0x40
[  489.412675] INFO: task journal-offline:14234 blocked for more than 120 seconds.
[  489.421012]       Tainted: P           OE     5.0.9-1-ck #1
[  489.421015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.437193] journal-offline D    0 14234      1 0x00000180
[  489.437197] Call Trace:
[  489.437206]  ? __schedule+0x6d9/0xe60
[  489.437281]  ? dmu_zfetch+0x530/0x540 [zfs]
[  489.437285]  ? preempt_count_add+0x79/0xb0
[  489.437287]  ? _raw_spin_lock+0x13/0x30
[  489.437289]  ? preempt_count_add+0x79/0xb0
[  489.437291]  schedule+0x32/0xb0
[  489.437300]  cv_wait_common+0xf9/0x120 [spl]
[  489.437303]  ? wait_woken+0x80/0x80
[  489.437336]  dbuf_read+0x3f5/0xb40 [zfs]
[  489.437370]  ? dbuf_hold_impl+0xa3/0xc0 [zfs]
[  489.437409]  dmu_tx_check_ioerr+0x61/0xb0 [zfs]
[  489.437448]  dmu_tx_count_write+0xe2/0x1a0 [zfs]
[  489.437487]  dmu_tx_hold_write+0x3c/0x50 [zfs]
[  489.437536]  zfs_putpage+0x1af/0x520 [zfs]
[  489.437540]  ? rmap_walk_file+0x123/0x260
[  489.437545]  ? percpu_counter_add_batch+0x84/0xb0
[  489.437593]  zpl_putpage+0x31/0x40 [zfs]
[  489.437596]  write_cache_pages+0x1c2/0x430
[  489.437644]  ? zpl_write_common_iovec+0xf0/0xf0 [zfs]
[  489.437693]  zpl_writepages+0x8a/0x150 [zfs]
[  489.437696]  do_writepages+0x41/0xd0
[  489.437699]  __filemap_fdatawrite_range+0xbe/0xf0
[  489.437702]  filemap_write_and_wait_range+0x30/0x80
[  489.437750]  zpl_fsync+0x37/0x90 [zfs]
[  489.437753]  ? __fget+0x6e/0xa0
[  489.437756]  do_fsync+0x38/0x70
[  489.437758]  __x64_sys_fsync+0x10/0x20
[  489.437761]  do_syscall_64+0x5b/0x170
[  489.437763]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  489.437766] RIP: 0033:0x7f7227fa7807
[  489.437771] Code: Bad RIP value.
[  489.437773] RSP: 002b:00007f7225101db0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[  489.437775] RAX: ffffffffffffffda RBX: 0000000000000016 RCX: 00007f7227fa7807
[  489.437776] RDX: 0000000000000000 RSI: 00007f7227cce8c6 RDI: 0000000000000016
[  489.437777] RBP: 00007f7227cd0120 R08: 00007f7225102700 R09: 00007f7225102700
[  489.437778] R10: 00000000000008c6 R11: 0000000000000293 R12: 0000000000000002
[  489.437779] R13: 00007ffdc2834bcf R14: 00007f7225102700 R15: 0000000000000015
[  489.437785] INFO: task sd-resolve:4715 blocked for more than 120 seconds.
[  489.445325]       Tainted: P           OE     5.0.9-1-ck #1
[  489.452233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.461538] sd-resolve      D    0  4715      1 0x00000180
[  489.461540] Call Trace:
[  489.461543]  ? __schedule+0x6d9/0xe60
[  489.461549]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.461554]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.461557]  ? preempt_count_add+0x79/0xb0
[  489.461559]  schedule+0x32/0xb0
[  489.461563]  cv_wait_common+0xf9/0x120 [spl]
[  489.461565]  ? wait_woken+0x80/0x80
[  489.461616]  spa_config_enter+0xef/0x100 [zfs]
[  489.461666]  zio_vdev_io_start+0x2de/0x2f0 [zfs]
[  489.461672]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[  489.461721]  zio_nowait+0xc3/0x180 [zfs]
[  489.461753]  arc_read+0x8d6/0xfe0 [zfs]
[  489.461786]  ? dbuf_rele_and_unlock+0x620/0x620 [zfs]
[  489.461819]  dbuf_read+0x265/0xb40 [zfs]
[  489.461855]  dmu_buf_hold_array_by_dnode+0x104/0x470 [zfs]
[  489.461892]  dmu_read_uio_dnode+0x47/0xf0 [zfs]
[  489.461895]  ? _raw_spin_unlock+0x16/0x30
[  489.461945]  ? rangelock_enter+0x14d/0x510 [zfs]
[  489.461981]  dmu_read_uio_dbuf+0x3f/0x60 [zfs]
[  489.462030]  zfs_read+0x127/0x480 [zfs]
[  489.462079]  zpl_read_common_iovec+0x87/0xd0 [zfs]
[  489.462128]  zpl_iter_read+0xd4/0x120 [zfs]
[  489.462131]  __vfs_read+0x14a/0x1a0
[  489.462134]  vfs_read+0x8a/0x130
[  489.462136]  ksys_read+0x4f/0xb0
[  489.462138]  do_syscall_64+0x5b/0x170
[  489.462140]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  489.462141] RIP: 0033:0x7fb4e6032098
[  489.462144] Code: Bad RIP value.
[  489.462145] RSP: 002b:00007fb4e466e248 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  489.462147] RAX: ffffffffffffffda RBX: 00007fb4dc000b20 RCX: 00007fb4e6032098
[  489.462148] RDX: 0000000000000200 RSI: 00007fb4dc000df0 RDI: 000000000000000f
[  489.462149] RBP: 0000000000000d68 R08: 0000000000000c00 R09: 00007fb4e46746e4
[  489.462150] R10: 00007fb4e466e7a0 R11: 0000000000000246 R12: 00007fb4e60fc5c0
[  489.462152] R13: 00007fb4e60fd1c0 R14: 0000000000000000 R15: 00007fb4e466e790
[  489.462157] INFO: task openvpn:5550 blocked for more than 120 seconds.
[  489.469400]       Tainted: P           OE     5.0.9-1-ck #1
[  489.476325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.485628] openvpn         D    0  5550      1 0x00000080
[  489.485630] Call Trace:
[  489.485633]  ? __schedule+0x6d9/0xe60
[  489.485638]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.485644]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.485646]  ? preempt_count_add+0x79/0xb0
[  489.485647]  schedule+0x32/0xb0
[  489.485652]  cv_wait_common+0xf9/0x120 [spl]
[  489.485654]  ? wait_woken+0x80/0x80
[  489.485705]  spa_config_enter+0xef/0x100 [zfs]
[  489.485754]  zio_vdev_io_start+0x2de/0x2f0 [zfs]
[  489.485760]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[  489.485809]  zio_nowait+0xc3/0x180 [zfs]
[  489.485840]  arc_read+0x8d6/0xfe0 [zfs]
[  489.485873]  ? dbuf_rele_and_unlock+0x620/0x620 [zfs]
[  489.485922]  ? zio_null+0x2b/0x30 [zfs]
[  489.485955]  dbuf_read+0x265/0xb40 [zfs]
[  489.485995]  dmu_tx_check_ioerr+0x61/0xb0 [zfs]
[  489.486033]  dmu_tx_count_write+0x65/0x1a0 [zfs]
[  489.486072]  dmu_tx_hold_write+0x3c/0x50 [zfs]
[  489.486121]  zfs_write+0x3e3/0xd60 [zfs]
[  489.486126]  ? poll_select_copy_remaining+0x1b0/0x1b0
[  489.486129]  ? kmem_cache_free+0x1d3/0x200
[  489.486177]  zpl_write_common_iovec+0x94/0xf0 [zfs]
[  489.486225]  zpl_iter_write+0xf0/0x130 [zfs]
[  489.486228]  __vfs_write+0x14d/0x1b0
[  489.486230]  vfs_write+0xa9/0x1a0
[  489.486232]  ksys_write+0x4f/0xb0
[  489.486235]  do_syscall_64+0x5b/0x170
[  489.486236]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  489.486238] RIP: 0033:0x7f9a3b1c81c5
[  489.486240] Code: Bad RIP value.
[  489.486241] RSP: 002b:00007ffee2190ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  489.486243] RAX: ffffffffffffffda RBX: 000055792ba5df00 RCX: 00007f9a3b1c81c5
[  489.486244] RDX: 000000000000009e RSI: 00007ffee2190f00 RDI: 0000000000000004
[  489.486245] RBP: 000000000000009e R08: 0000000000000164 R09: 0000000000000000
[  489.486247] R10: 00007ffee2191102 R11: 0000000000000246 R12: 00007ffee2190f00
[  489.486248] R13: 0000000000000004 R14: 000055792ba55500 R15: 0000000000000005
[  489.486252] INFO: task openvpn:5553 blocked for more than 120 seconds.
[  489.493492]       Tainted: P           OE     5.0.9-1-ck #1
[  489.500400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.509712] openvpn         D    0  5553      1 0x00000080
[  489.509713] Call Trace:
[  489.509716]  ? __schedule+0x6d9/0xe60
[  489.509722]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.509724]  ? preempt_count_add+0x79/0xb0
[  489.509726]  schedule+0x32/0xb0
[  489.509730]  cv_wait_common+0xf9/0x120 [spl]
[  489.509732]  ? wait_woken+0x80/0x80
[  489.509783]  spa_config_enter+0xef/0x100 [zfs]
[  489.509832]  zio_vdev_io_start+0x2de/0x2f0 [zfs]
[  489.509839]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[  489.509887]  zio_nowait+0xc3/0x180 [zfs]
[  489.509919]  arc_read+0x8d6/0xfe0 [zfs]
[  489.509951]  ? dbuf_rele_and_unlock+0x620/0x620 [zfs]
[  489.510000]  ? zio_null+0x2b/0x30 [zfs]
[  489.510033]  dbuf_read+0x265/0xb40 [zfs]
[  489.510073]  dmu_tx_check_ioerr+0x61/0xb0 [zfs]
[  489.510111]  dmu_tx_count_write+0x65/0x1a0 [zfs]
[  489.510149]  dmu_tx_hold_write+0x3c/0x50 [zfs]
[  489.510199]  zfs_write+0x3e3/0xd60 [zfs]
[  489.510203]  ? __switch_to_asm+0x34/0x70
[  489.510204]  ? __switch_to_asm+0x40/0x70
[  489.510206]  ? __switch_to_asm+0x34/0x70
[  489.510207]  ? __switch_to_asm+0x40/0x70
[  489.510209]  ? __switch_to_asm+0x34/0x70
[  489.510210]  ? __switch_to_asm+0x40/0x70
[  489.510211]  ? __switch_to_asm+0x34/0x70
[  489.510213]  ? __switch_to_asm+0x40/0x70
[  489.510214]  ? __switch_to_asm+0x34/0x70
[  489.510216]  ? __switch_to_asm+0x40/0x70
[  489.510217]  ? _raw_spin_unlock+0x16/0x30
[  489.510266]  zpl_write_common_iovec+0x94/0xf0 [zfs]
[  489.510314]  zpl_iter_write+0xf0/0x130 [zfs]
[  489.510317]  __vfs_write+0x14d/0x1b0
[  489.510319]  vfs_write+0xa9/0x1a0
[  489.510321]  ksys_write+0x4f/0xb0
[  489.510323]  do_syscall_64+0x5b/0x170
[  489.510325]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  489.510327] RIP: 0033:0x7fa4dfdd11c5
[  489.510329] Code: Bad RIP value.
[  489.510330] RSP: 002b:00007ffc38482808 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  489.510332] RAX: ffffffffffffffda RBX: 000055db0fd0cc40 RCX: 00007fa4dfdd11c5
[  489.510333] RDX: 000000000000009e RSI: 00007ffc38482830 RDI: 0000000000000003
[  489.510334] RBP: 000000000000009e R08: 0000000000000164 R09: 0000000000000000
[  489.510335] R10: 00007ffc38482a32 R11: 0000000000000246 R12: 00007ffc38482830
[  489.510337] R13: 0000000000000003 R14: 000055db0fd04420 R15: 0000000000000000
[  489.510342] INFO: task ntpd:6079 blocked for more than 120 seconds.
[  489.517324]       Tainted: P           OE     5.0.9-1-ck #1
[  489.524237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.533556] ntpd            D    0  6079      1 0x00000080
[  489.533558] Call Trace:
[  489.533561]  ? __schedule+0x6d9/0xe60
[  489.533567]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.533572]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.533574]  ? preempt_count_add+0x79/0xb0
[  489.533576]  schedule+0x32/0xb0
[  489.533580]  cv_wait_common+0xf9/0x120 [spl]
[  489.533582]  ? wait_woken+0x80/0x80
[  489.533633]  spa_config_enter+0xef/0x100 [zfs]
[  489.533682]  zio_vdev_io_start+0x2de/0x2f0 [zfs]
[  489.533688]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[  489.533737]  zio_nowait+0xc3/0x180 [zfs]
[  489.533768]  arc_read+0x8d6/0xfe0 [zfs]
[  489.533801]  ? dbuf_rele_and_unlock+0x620/0x620 [zfs]
[  489.533835]  dbuf_read+0x265/0xb40 [zfs]
[  489.533871]  dmu_buf_hold_array_by_dnode+0x104/0x470 [zfs]
[  489.533907]  dmu_read_uio_dnode+0x47/0xf0 [zfs]
[  489.533909]  ? _raw_spin_unlock+0x16/0x30
[  489.533959]  ? rangelock_enter+0x14d/0x510 [zfs]
[  489.533995]  dmu_read_uio_dbuf+0x3f/0x60 [zfs]
[  489.534045]  zfs_read+0x127/0x480 [zfs]
[  489.534094]  zpl_read_common_iovec+0x87/0xd0 [zfs]
[  489.534142]  zpl_iter_read+0xd4/0x120 [zfs]
[  489.534145]  __vfs_read+0x14a/0x1a0
[  489.534147]  vfs_read+0x8a/0x130
[  489.534149]  ksys_read+0x4f/0xb0
[  489.534151]  do_syscall_64+0x5b/0x170
[  489.534153]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  489.534154] RIP: 0033:0x7fdacecc2098
[  489.534157] Code: Bad RIP value.
[  489.534158] RSP: 002b:00007fdacebc5478 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  489.534160] RAX: ffffffffffffffda RBX: 00007fdac8001180 RCX: 00007fdacecc2098
[  489.534161] RDX: 0000000000002000 RSI: 00007fdac8004430 RDI: 0000000000000004
[  489.534162] RBP: 0000000000000d68 R08: 0000000000000000 R09: 0000000000000003
[  489.534163] R10: 00007fdac80008d0 R11: 0000000000000246 R12: 00007fdaced8c5c0
[  489.534164] R13: 00007fdaced8d1c0 R14: 0000000000000000 R15: 0000000000000400
[  489.534171] INFO: task systemd:13225 blocked for more than 120 seconds.
[  489.541509]       Tainted: P           OE     5.0.9-1-ck #1
[  489.548420] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.557743] systemd         D    0 13225      1 0x00000080
[  489.557745] Call Trace:
[  489.557747]  ? __schedule+0x6d9/0xe60
[  489.557753]  ? spl_kmem_cache_alloc+0x112/0x760 [spl]
[  489.557756]  ? preempt_count_add+0x79/0xb0
[  489.557757]  schedule+0x32/0xb0
[  489.557761]  cv_wait_common+0xf9/0x120 [spl]
[  ûÿ489.557764]  ? wait_woken+0x80/0x80
[  489.557814]  spa_config_enter+0xef/0x100 [zfs]
[  489.557866]  zio_vdev_io_start+0x2de/0x2f0 [zfs]
[  489.557872]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[  489.557921]  zio_nowait+0xc3/0x180 [zfs]
[  489.557952]  arc_read+0x8d6/0xfe0 [zfs]
[  489.557984]  ? dbuf_rele_and_unlock+0x620/0x620 [zfs]
[  489.558018]  dbuf_read+0x265/0xb40 [zfs]
[  489.558054]  dmu_buf_hold_array_by_dnode+0x104/0x470 [zfs]
[  489.558090]  dmu_read_impl+0xa3/0x170 [zfs]
[  489.558126]  dmu_read+0x56/0x90 [zfs]
[  489.558128]  ? _raw_spin_lock+0x13/0x30
[  489.558177]  zfs_getpage+0xe4/0x170 [zfs]
[  489.558226]  zpl_readpage+0x53/0xb0 [zfs]
[  489.558228]  filemap_fault+0x3d0/0x630
[  489.558231]  __do_fault+0x38/0x120
[  489.558233]  __handle_mm_fault+0xb37/0x12a0
[  489.558236]  handle_mm_fault+0x10a/0x250
[  489.558239]  __do_page_fault+0x264/0x4d0
[  489.558241]  do_page_fault+0x32/0x130
[  489.558244]  ? page_fault+0x8/0x30
[  489.558245]  page_fault+0x1e/0x30
[  489.558247] RIP: 0033:0x7f41663d4dd0
[  489.558249] Code: Bad RIP value.
[  489.558250] RSP: 002b:00007fffdb4f7ed8 EFLAGS: 00010202
[  489.558252] RAX: 0000556f1ebbb240 RBX: 0000556f1ebbb970 RCX: 0000556f1ebbb240
[  489.558253] RDX: 0000556f1ebbb800 RSI: 0000556f1ebbb1e0 RDI: 0000556f1ebaae18
[  489.558254] RBP: 0000556f1ebaae18 R08: 0000556f1ebbb229 R09: 0000000000000020
[  489.558255] R10: 00000000000000c0 R11: 00007f41664dfc40 R12: 0000000000000002
[  489.558256] R13: 0000000000000000 R14: 00007fffdb4f7f80 R15: 0000556f1da94e98
[  489.558260] INFO: task zpool:14217 blocked for more than 120 seconds.
[  489.565509]       Tainted: P           OE     5.0.9-1-ck #1
[  489.572425] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.581748] zpool           D    0 14217  13847 0x00000080
[  489.581750] Call Trace:
[  489.581752]  ? __schedule+0x6d9/0xe60
[  489.581755]  ? __wake_up_common+0x77/0x140
[  489.581757]  ? preempt_count_add+0x79/0xb0
[  489.581808]  ? vdev_open_children+0x160/0x160 [zfs]
[  489.581809]  schedule+0x32/0xb0
[  489.581815]  taskq_wait+0x72/0xc0 [spl]
[  489.581818]  ? wait_woken+0x80/0x80
[  489.581824]  taskq_destroy+0x40/0x160 [spl]
[  489.581874]  vdev_open_children+0x10a/0x160 [zfs]
[  489.581926]  vdev_root_open+0x37/0x120 [zfs]
[  489.581977]  vdev_open+0xa8/0x710 [zfs]
[  489.582028]  ? vdev_cache_purge+0x6e/0x80 [zfs]
[  489.582078]  vdev_reopen+0x2e/0xc0 [zfs]
[  489.582124]  dsl_scan+0x36/0x120 [zfs]
[  489.582175]  zfs_ioc_pool_scan+0x5a/0xd0 [zfs]
[  489.582226]  zfsdev_ioctl+0x6cf/0x8c0 [zfs]
[  489.582229]  do_vfs_ioctl+0xa4/0x630
[  489.582231]  ? handle_mm_fault+0x10a/0x250
[  489.582233]  ? syscall_trace_enter+0x1d3/0x2d0
[  489.582235]  ksys_ioctl+0x60/0x90
[  489.582237]  __x64_sys_ioctl+0x16/0x20
[  489.582238]  do_syscall_64+0x5b/0x170
[  489.582240]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  489.582242] RIP: 0033:0x7fa734019cbb
[  489.582244] Code: Bad RIP value.
[  489.582245] RSP: 002b:00007ffc61385548 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  489.582247] RAX: ffffffffffffffda RBX: 00007ffc61385580 RCX: 00007fa734019cbb
[  489.582248] RDX: 00007ffc61385580 RSI: 0000000000005a07 RDI: 0000000000000003
[  489.582249] RBP: 00007ffc61388f70 R08: 0000000000000008 R09: 00007fa734065800
[  489.582250] R10: 0000563d7ddd4010 R11: 0000000000000246 R12: 0000000000000001
[  489.582251] R13: 0000563d7ddd7570 R14: 0000000000000000 R15: 0000563d7ddd5430
[  489.582254] INFO: task vdev_open:14218 blocked for more than 120 seconds.
[  489.589876]       Tainted: P           OE     5.0.9-1-ck #1
[  489.596781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.606093] vdev_open       D    0 14218      2 0x80000080
[  489.606095] Call Trace:
[  489.606098]  ? __schedule+0x6d9/0xe60
[  489.606101]  ? preempt_count_add+0x79/0xb0
[  489.606151]  ? vdev_open_children+0x160/0x160 [zfs]
[  489.606153]  schedule+0x32/0xb0
[  489.606159]  taskq_wait+0x72/0xc0 [spl]
[  489.606161]  ? wait_woken+0x80/0x80
[  489.606167]  taskq_destroy+0x40/0x160 [spl]
[  489.606217]  vdev_open_children+0x10a/0x160 [zfs]
[  489.606268]  vdev_raidz_open+0x40/0x170 [zfs]
[  489.606319]  vdev_open+0xa8/0x710 [zfs]
[  489.606321]  ? __wake_up_common_lock+0x89/0xc0
[  489.606371]  vdev_open_child+0x1e/0x40 [zfs]
[  489.606378]  taskq_thread+0x2ca/0x490 [spl]
[  489.606380]  ? wake_up_q+0x70/0x70
[  489.606387]  ? task_done+0x90/0x90 [spl]
[  489.606390]  kthread+0x112/0x130
[  489.606392]  ? kthread_park+0x80/0x80
[  489.606394]  ret_from_fork+0x35/0x40

qix67 · 2019-05-10T07:26:52Z

Hello,

I have the same issue but with different kernel and zfs versions. I run a gentoo.

Until 2 weeks ago, my kernel was 4.14.52 with ZoL 0.7.6. My pools were:

root:
3 way-mirror
sda, sdb, sdc
3x240GB SSD
data:
raid-z1
sdd, sde, sdf
3x2TB 5900rpm HD

2 weeks ago, I slightly upgraded my hardware and I updated kernel and ZoL. Now, kernel version is 4.19.27 with ZoL 0.7.13. My pools are:

boot:
3 way-mirror
sda1, sdb1, sdc1
3x300MB SSD partition
root:
2 way-mirror
nvme0, nvme1
2x480GB NVMe
data
raid-z1
sdd, sde, sdf
3x2TB 5900rpm HD
- ZIL mirror:
  sdb2, sdc2
  2x239GB SSD
- cache:
  sda2
  239GB SSD

The previous setting worked flawlessly during 1 year nearly without reboot, scrubbing root pool once per week.

Since update, I have no problem to scrub boot and data pool but, most of the time, kernel hangs when I scrub root pool and always without any message, I still can switch between tty. Each time, I scrub boot pool and then root pool. After each crash, my mariadb database is corrupted beyond recovery and I had to rollback zfs snapshot to restart it.

Yesterday evening, after 1 week uptime, when scrubbing root pool starts, kernel hanged. I rebooted my computer, remove data pool ZIL and cache devices and I was able to scrub root pool without problem.

This morning, I re-add data pool ZIL and cache devices and still encounter no problem to scrub root pool.

Server activity is quite low. Hardware is Xeon L5640 (6c/12t) with 72GB ECC RAM. At the moment, more than 32GB RAM is free and CPU load average is rougly 0.5.

Pool capacities are:

boot 10%
root 37%
data 65%
At most, data ZIL contained 1GB and cache was nearly filled (after 1 week uptime).

NVMe, SSD and HD temperatures are ok. NVMe are below 47°C during scrub, SSD are below 23°C and HD are below 30°C.

No hardware problem seems to exist.

Thanks for all your work.

xyzpublic · 2019-05-23T05:07:09Z

0.7.5 w/kernel 4.15 . while resuming from s3 sleep, kernel randomly freezes. Any connection here?

marker5a · 2019-05-23T10:36:41Z

It's difficult to tell right away without seeing any logs. The issue initially reported seems to do with some issue with memory allocation during the scrub. Do you have any log info?

…

On Thu, May 23, 2019 at 1:07 AM xyzpublic ***@***.***> wrote: 0.7.5 w/kernel 4.15 . while resuming from s3 sleep, kernel randomly freezes. Any connection here? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8664>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABSNTYKBG5RY43DZGSHEBU3PWYRBNANCNFSM4HH7QMMQ> .

qix67 · 2019-05-23T11:16:08Z

I also think so as scrubbing root on a freshly booted computer works for me.

Regarding my previous post, I confirm the problem. I disable root scrubbing and since then, my server has not crashed.

I do not use suspend/hibernate on my server.

vanes32 · 2019-08-05T07:51:12Z

I have the same issue on my 2 servers. Both are using root pools. Pool is newly created. When i run scrub right after reboot, it runs fine. After some uptime issue happens when i run scrub. Here are logs (journalctl -f and dmesg -wT) during scrub issue happens. Hardware is 100% fine
journalctl -f.txt
dmesg -wT.txt

marker5a · 2019-08-08T10:37:47Z

I have the same issue on my 2 servers. Both are using root pools. Pool is newly created. When i run scrub right after reboot, it runs fine. After some uptime issue happens when i run scrub. Here are logs (journalctl -f and dmesg -wT) during scrub issue happens. Hardware is 100% fine
journalctl -f.txt
dmesg -wT.txt

Does the same behavior occur if you scrub the pool as a non-root pool?... curious to see if this is tied to it being a root fs as well

vanes32 · 2019-08-08T10:53:21Z

I will try to create additional non-rool pool, do some tests and write my results here

marker5a · 2019-08-08T12:41:50Z

@behlendorf FWIW, I do have an interesting data point for this... not sure if it's helpful. If I initiate the scrub in my initramfs prior to kernel execution, the scrub behaves normally and proceeds ok. From that point on, I can boot in to the kernel and the scrub continues without issue.

vanes32 · 2019-08-11T07:31:47Z

Today was auto-scrub day. One of my to servers hanged during it. This is last what i saw

marker5a · 2019-08-12T14:22:50Z

Today was auto-scrub day. One of my to servers hanged during it. This is last what i saw

Can you provide specs for your server? CPU and RAM? I've only got 8GB on my machine and to the point that @behlendorf made about cgroups, it seems suspect that maybe I just dont have enough RAM to initiate the scrub in the root environment. I did try downgrading to older kernels without success, but still wondering if memory restrictions are at play here

vanes32 · 2019-08-12T14:36:05Z

This is my test server. Small j4205 with 8Gb of ram. Latest proxmox VE 6.0.5 ZFS 0.8.1 . zfs_arc_max set to 3Gb. Problem began with new kernel.

pongraczi · 2019-09-10T13:33:53Z

Hi, on Proxmox forum there is a long thread now, which is about this kind of issue:
https://forum.proxmox.com/threads/proxmox-v6-servers-freeze-zvol-blocked-for-more-than-120s.57765/

In short:
Kernel 5.x, zfs 0.8.x, systems hangs unpredictable, some guys are able to trigger this hang with starting scrub directly.
It seems, not related to memory, cpu, zpool layout, wide range of configs are affected.

pongraczi · 2019-09-10T22:02:08Z

On the forum wasteground posted a kernel log about the situation:
proxmox3.log

He wrote this:

Okay so I managed to move things around a little earlier than expected, and so the new information
from me is:

I stopped the only VM on this machine (my storage VM providing backup space for the proxmox cluster), so no workloads are running on the node

i did "echo 1 > /proc/sys/kernel/sysrq " and "echo t > /proc/sysrq-trigger" with remote syslog enabled

I ran one "echo t" initially, then started the zpool scrub, then managed to run "echo t" several more times before everything started falling apart on the box (and then I had to power cycle it, it returns to service just fine and shows nothing about the zpool scrub at all - so I guess the scrub actually never really starts). Trying to run any process on the box at this point that requires disk I/O just stalls and cannot be returned from. Anything in memory/already running that doesn't need disk I/O seems to keep running just fine.

The log is attached - I annotated each section with ">>>" to show where the sysrq-triggers were fired, so hopefully this provides some useful info.

The last "section" of the log shows processes getting stuck because of no I/O i think. It's almost exactly as @sandor mentioned, it's like the box lost all disks.

behlendorf · 2019-09-10T23:40:24Z

@pongraczi thank you for posting the logs. I believe I see what's causing the reported hangs. As a workaround you can try setting zfs_vdev_scheduler=none. This should avoid the problem, it looks like 4.12 and newer kernels might encounter this.

pongraczi · 2019-09-11T05:02:11Z

@behlendorf thank you, I posted it on the proxmox forum, I guess some guys have such systems to test it soon.

pongraczi · 2019-09-11T19:51:05Z

Just for the record, there are scheduler check before and after on the Proxmox forum and it seems,

the /sys/block/sdX/queue/scheduler is [mq-deadline] none before and after, too, so no changes
the /sys/module/zfs/parameters/zfs_vdev_scheduler changed from noop to none

One of my system was not affected, even it is a remote backup server via internet, but really slow bandwith. That has only hard disks as vdevs, only the log/zil are on ssd.

Please check the Proxmox forum for details, as other users post their configs.
Just let me know, if you need to post them here, I will copy.

pongraczi · 2019-09-11T20:06:52Z

Example about an affected config:

root@lm1:~# zpool status
  pool: rpool
state: ONLINE
  scan: scrub repaired 0B in 0 days 01:08:43 with 0 errors on Sun Jul 14 01:32:44 2019
config:

    NAME                                             STATE     READ WRITE CKSUM
    rpool                                            ONLINE       0     0     0
      mirror-0                                       ONLINE       0     0     0
        sde2                                         ONLINE       0     0     0
        sdf2                                         ONLINE       0     0     0
      mirror-1                                       ONLINE       0     0     0
        ata-Samsung_SSD_850_PRO_1TB_S252NXAG805442N  ONLINE       0     0     0
        ata-Samsung_SSD_850_PRO_1TB_S252NXAG805106B  ONLINE       0     0     0

errors: No known data errors

  pool: zbackup
state: ONLINE
  scan: scrub repaired 0B in 1 days 03:32:58 with 0 errors on Mon Jul 15 03:57:00 2019
config:

    NAME                                          STATE     READ WRITE CKSUM
    zbackup                                       ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M3UZC1F1  ONLINE       0     0     0
        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M7ZESTA2  ONLINE       0     0     0
      mirror-1                                    ONLINE       0     0     0
        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M7ZEST6D  ONLINE       0     0     0
        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6AVE83Y  ONLINE       0     0     0

errors: No known data errors

The following is the same in before/after changing zfs_vdev_scheduler

root@lm1:~# for i in a b c d e f g h; do cat /sys/block/sd$i/queue/scheduler; done
[mq-deadline] none
[mq-deadline] none
[mq-deadline] none
[mq-deadline] none
[mq-deadline] none
[mq-deadline] none
[mq-deadline] none
[mq-deadline] none

Before:

root@lm1:~# cat /sys/module/zfs/parameters/zfs_vdev_scheduler
noop

After

root@lm1:~# cat /sys/module/zfs/parameters/zfs_vdev_scheduler
none

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this can result in a system hang if the usermodehelper does not return. This is because it may be called with the spa config lock held as a writter. For example via the zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() callpath. Additionally, there's a reasonable argument that setting the elevator is the resonsiblity of whatever software is being used to configure the system. Therefore, it's proposed that the zfs_vdev_elevator be removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8664

behlendorf · 2019-09-12T00:53:47Z

Thanks for verifying the proposed workaround. For those interested, this is caused by an interaction between the scrub, which attempted to reopen the devices, and the compatibility code required for the zfs.ko to set the block device's scheduler. Setting zfs_vdev_scheduler=none prevents the issue by instructing the zfs.ko to never modify the scheduler.

I've opened #9317 which proposes to remove the zfs_vdev_scheduler module option and delegate setting the scheduler as needed entirely to user space. Additional information is provided in the PR and discussion is welcome. This change would apply to the next major release, we should be able to do something less disruptive for the current release.

#9321) Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. Unfortunately changing the evelator via usermodehelper requires reading some userland binaries, most notably modprobe(8) or sh(1), from a zfs dataset on systems with root-on-zfs. This can deadlock the system if used during the following call path because it may need, if the data is not already cached in the ARC, reading directly from disk while holding the spa config lock as a writer: zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() -> vdev_elevator_switch() -> call_usermodehelper() While the usermodehelper waits sh(1), modprobe(8) is blocked in the ZIO pipeline trying to read from disk: INFO: task modprobe:2650 blocked for more than 10 seconds. Tainted: P OE 5.2.14 modprobe D 0 2650 206 0x00000000 Call Trace: ? __schedule+0x244/0x5f0 schedule+0x2f/0xa0 cv_wait_common+0x156/0x290 [spl] ? do_wait_intr_irq+0xb0/0xb0 spa_config_enter+0x13b/0x1e0 [zfs] zio_vdev_io_start+0x51d/0x590 [zfs] ? tsd_get_by_thread+0x3b/0x80 [spl] zio_nowait+0x142/0x2f0 [zfs] arc_read+0xb2d/0x19d0 [zfs] ... zpl_iter_read+0xfa/0x170 [zfs] new_sync_read+0x124/0x1b0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x4f/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This commit changes how we use the usermodehelper functionality from synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents scrubs, and other vdev_elevator_switch() consumers, from triggering the aforementioned issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Issue #8664 Closes #9321

cyril23 · 2019-09-14T13:51:11Z

@behlendorf Maybe you could have a look at my logs at https://forum.proxmox.com/threads/proxmox-v6-servers-freeze-zvol-blocked-for-more-than-120s.57765/post-267351 just to confirm your solution fits my case, too? That would make me feel a lot better! Thanks

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this can result in a system hang if the usermodehelper does not return. This is because it may be called with the spa config lock held as a writter. For example via the zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() callpath. Additionally, there's a reasonable argument that setting the elevator is the resonsiblity of whatever software is being used to configure the system. Therefore, it's proposed that the zfs_vdev_elevator be removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8664 Issue openzfs#9317

openzfs#9321) Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. Unfortunately changing the evelator via usermodehelper requires reading some userland binaries, most notably modprobe(8) or sh(1), from a zfs dataset on systems with root-on-zfs. This can deadlock the system if used during the following call path because it may need, if the data is not already cached in the ARC, reading directly from disk while holding the spa config lock as a writer: zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() -> vdev_elevator_switch() -> call_usermodehelper() While the usermodehelper waits sh(1), modprobe(8) is blocked in the ZIO pipeline trying to read from disk: INFO: task modprobe:2650 blocked for more than 10 seconds. Tainted: P OE 5.2.14 modprobe D 0 2650 206 0x00000000 Call Trace: ? __schedule+0x244/0x5f0 schedule+0x2f/0xa0 cv_wait_common+0x156/0x290 [spl] ? do_wait_intr_irq+0xb0/0xb0 spa_config_enter+0x13b/0x1e0 [zfs] zio_vdev_io_start+0x51d/0x590 [zfs] ? tsd_get_by_thread+0x3b/0x80 [spl] zio_nowait+0x142/0x2f0 [zfs] arc_read+0xb2d/0x19d0 [zfs] ... zpl_iter_read+0xfa/0x170 [zfs] new_sync_read+0x124/0x1b0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x4f/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This commit changes how we use the usermodehelper functionality from synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents scrubs, and other vdev_elevator_switch() consumers, from triggering the aforementioned issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Issue openzfs#8664 Closes openzfs#9321

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this introduced a bug which could cause a system hang, that issue was subsequently fixed by commit 2a0d418. In order to avoid future bugs in this area, and to simplify the code, this functionality is being deprecated. A console warning has been added to notify any existing consumers and the documentation updated accordingly. This option will remain for the lifetime of the 0.8.x series for compatibility but if planned to be phased out of master. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8664 Issue openzfs#9317

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this introduced a bug which could cause a system hang, that issue was subsequently fixed by commit 2a0d418. In order to avoid future bugs in this area, and to simplify the code, this functionality is being deprecated. A console warning has been added to notify any existing consumers and the documentation updated accordingly. This option will remain for the lifetime of the 0.8.x series for compatibility but if planned to be phased out of master. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8664 Closes #9317

#9321) Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. Unfortunately changing the evelator via usermodehelper requires reading some userland binaries, most notably modprobe(8) or sh(1), from a zfs dataset on systems with root-on-zfs. This can deadlock the system if used during the following call path because it may need, if the data is not already cached in the ARC, reading directly from disk while holding the spa config lock as a writer: zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() -> vdev_elevator_switch() -> call_usermodehelper() While the usermodehelper waits sh(1), modprobe(8) is blocked in the ZIO pipeline trying to read from disk: INFO: task modprobe:2650 blocked for more than 10 seconds. Tainted: P OE 5.2.14 modprobe D 0 2650 206 0x00000000 Call Trace: ? __schedule+0x244/0x5f0 schedule+0x2f/0xa0 cv_wait_common+0x156/0x290 [spl] ? do_wait_intr_irq+0xb0/0xb0 spa_config_enter+0x13b/0x1e0 [zfs] zio_vdev_io_start+0x51d/0x590 [zfs] ? tsd_get_by_thread+0x3b/0x80 [spl] zio_nowait+0x142/0x2f0 [zfs] arc_read+0xb2d/0x19d0 [zfs] ... zpl_iter_read+0xfa/0x170 [zfs] new_sync_read+0x124/0x1b0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x4f/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This commit changes how we use the usermodehelper functionality from synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents scrubs, and other vdev_elevator_switch() consumers, from triggering the aforementioned issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Issue #8664 Closes #9321

As described in commit f81d5ef the zfs_vdev_elevator module option is being removed. Users who require this functionality should update their systems to set the disk scheduler using a udev rule. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8664 Closes #9417 Closes #9609

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this introduced a bug which could cause a system hang, that issue was subsequently fixed by commit 2a0d418. In order to avoid future bugs in this area, and to simplify the code, this functionality is being deprecated. A console warning has been added to notify any existing consumers and the documentation updated accordingly. This option will remain for the lifetime of the 0.8.x series for compatibility but if planned to be phased out of master. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8664 Closes openzfs#9317

As described in commit f81d5ef the zfs_vdev_elevator module option is being removed. Users who require this functionality should update their systems to set the disk scheduler using a udev rule. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8664 Closes openzfs#9417 Closes openzfs#9609

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this introduced a bug which could cause a system hang, that issue was subsequently fixed by commit 2a0d418. In order to avoid future bugs in this area, and to simplify the code, this functionality is being deprecated. A console warning has been added to notify any existing consumers and the documentation updated accordingly. This option will remain for the lifetime of the 0.8.x series for compatibility but if planned to be phased out of master. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8664 Closes openzfs#9317

As described in commit f81d5ef the zfs_vdev_elevator module option is being removed. Users who require this functionality should update their systems to set the disk scheduler using a udev rule. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8664 Closes openzfs#9417 Closes openzfs#9609

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this introduced a bug which could cause a system hang, that issue was subsequently fixed by commit 2a0d418. In order to avoid future bugs in this area, and to simplify the code, this functionality is being deprecated. A console warning has been added to notify any existing consumers and the documentation updated accordingly. This option will remain for the lifetime of the 0.8.x series for compatibility but if planned to be phased out of master. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8664 Closes #9317

As described in commit f81d5ef the zfs_vdev_elevator module option is being removed. Users who require this functionality should update their systems to set the disk scheduler using a udev rule. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8664 Closes #9417 Closes #9609

behlendorf added the Type: Building Indicates an issue related to building binaries label Apr 24, 2019

behlendorf added this to the 0.8.0 milestone Apr 24, 2019

behlendorf removed this from the 0.8.0 milestone May 23, 2019

behlendorf added Type: Defect Incorrect behavior (e.g. crash, hang) and removed Type: Building Indicates an issue related to building binaries labels May 23, 2019

marker5a mentioned this issue Jul 26, 2019

After a week of running array, issuing zpool scrub causes system hang #7553

Closed

marker5a closed this as completed Aug 8, 2019

marker5a reopened this Aug 8, 2019

behlendorf mentioned this issue Sep 12, 2019

Remove zfs_vdev_elevator module option #9317

Merged

12 tasks

loli10K mentioned this issue Sep 13, 2019

Scrubbing root pools may deadlock on kernels without elevator_change() #9321

Merged

12 tasks

behlendorf closed this as completed in #9321 Sep 14, 2019

tonynguien mentioned this issue Oct 7, 2019

Use UMH_WAIT_EXEC with call_usermodehelper() to reliably set elevator #9422

Closed

12 tasks

behlendorf mentioned this issue Nov 21, 2019

Remove zfs_vdev_elevator module option #9609

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash on scrub #8664

Crash on scrub #8664

marker5a commented Apr 24, 2019

behlendorf commented Apr 24, 2019

marker5a commented Apr 26, 2019

marker5a commented Apr 26, 2019

behlendorf commented Apr 27, 2019

marker5a commented May 1, 2019 •

edited

Loading

qix67 commented May 10, 2019

xyzpublic commented May 23, 2019

marker5a commented May 23, 2019 via email

qix67 commented May 23, 2019

vanes32 commented Aug 5, 2019

marker5a commented Aug 8, 2019

vanes32 commented Aug 8, 2019 •

edited

Loading

marker5a commented Aug 8, 2019

vanes32 commented Aug 11, 2019

marker5a commented Aug 12, 2019

vanes32 commented Aug 12, 2019

pongraczi commented Sep 10, 2019

pongraczi commented Sep 10, 2019

behlendorf commented Sep 10, 2019

pongraczi commented Sep 11, 2019

pongraczi commented Sep 11, 2019

pongraczi commented Sep 11, 2019

behlendorf commented Sep 12, 2019

cyril23 commented Sep 14, 2019

Crash on scrub #8664

Crash on scrub #8664

Comments

marker5a commented Apr 24, 2019

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

behlendorf commented Apr 24, 2019

marker5a commented Apr 26, 2019

marker5a commented Apr 26, 2019

behlendorf commented Apr 27, 2019

marker5a commented May 1, 2019 • edited Loading

qix67 commented May 10, 2019

xyzpublic commented May 23, 2019

marker5a commented May 23, 2019 via email

qix67 commented May 23, 2019

vanes32 commented Aug 5, 2019

marker5a commented Aug 8, 2019

vanes32 commented Aug 8, 2019 • edited Loading

marker5a commented Aug 8, 2019

vanes32 commented Aug 11, 2019

marker5a commented Aug 12, 2019

vanes32 commented Aug 12, 2019

pongraczi commented Sep 10, 2019

pongraczi commented Sep 10, 2019

behlendorf commented Sep 10, 2019

pongraczi commented Sep 11, 2019

pongraczi commented Sep 11, 2019

pongraczi commented Sep 11, 2019

behlendorf commented Sep 12, 2019

cyril23 commented Sep 14, 2019

marker5a commented May 1, 2019 •

edited

Loading

vanes32 commented Aug 8, 2019 •

edited

Loading