Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool import deadlock with 0.7.5 (imports fine with 0.6.5.6 in ro mode) #8100

Closed
rejsmont opened this issue Nov 6, 2018 · 4 comments
Closed

Comments

@rejsmont
Copy link

rejsmont commented Nov 6, 2018

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Linux Kernel 4.15.0-38-generic
Architecture x86_64
ZFS Version 0.7.5-1ubuntu15
SPL Version 0.7.5-1ubuntu1

Describe the problem you're observing

zpool import results in a deadlock when importing z pool created with 0.7 version of ZFS module. I am able to import it with zfs 0.6.5 version.

Describe how to reproduce the problem

Unknown - the problem suddenly appeared after upgrade. I was able to reproduce this issue on several machines, both virtual and bare metal. I can provide the image of pool in question (4G).

Include any warning/errors/backtraces from the system logs

From kernel log:

[    0.045394] cpu 0 spinlock event irq 53
[    0.075787] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.979793] Grant table initialized
[    1.195686] mce: Unable to init MCE device (rc: -5)
[    1.929728] piix4_smbus 0000:00:01.3: SMBus Host Controller not enabled!
[    2.136706] spl: loading out-of-tree module taints kernel.
[    2.151423] znvpair: module license 'CDDL' taints kernel.
[    2.159127] Disabling lock debugging due to kernel taint
[  242.656151] INFO: task zpool:1083 blocked for more than 120 seconds.
[  242.665642]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[  242.674866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.685825] INFO: task txg_sync:2168 blocked for more than 120 seconds.
[  242.694072]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[  242.701569] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

When importing from a hotplugged drive with the pool (or drive image) dmesg shows the following before zpool hangs:

[72192.717406] WARNING: Pool 'lxd-data-2' has encountered an uncorrectable I/O failure and has been suspended.

[72379.365792] INFO: task l2arc_feed:772 blocked for more than 120 seconds.
[72379.365799]       Tainted: P           OE    4.15.0-39-generic #42-Ubuntu
[72379.365802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[72379.365805] l2arc_feed      D    0   772      2 0x80000000
[72379.365810] Call Trace:
[72379.365821]  __schedule+0x291/0x8a0
[72379.365827]  schedule+0x2c/0x80
[72379.365830]  schedule_preempt_disabled+0xe/0x10
[72379.365834]  __mutex_lock.isra.2+0x18c/0x4d0
[72379.365838]  ? _cond_resched+0x19/0x40
[72379.365853]  ? __cv_timedwait_common+0xe6/0x160 [spl]
[72379.365857]  __mutex_lock_slowpath+0x13/0x20
[72379.365861]  ? __mutex_lock_slowpath+0x13/0x20
[72379.365864]  mutex_lock+0x2f/0x40
[72379.365945]  l2arc_feed_thread+0xe3/0x430 [zfs]
[72379.365984]  ? l2arc_evict+0x340/0x340 [zfs]
[72379.365992]  thread_generic_wrapper+0x74/0x90 [spl]
[72379.366000]  kthread+0x121/0x140
[72379.366007]  ? __thread_exit+0x20/0x20 [spl]
[72379.366011]  ? kthread_create_worker_on_cpu+0x70/0x70
[72379.366018]  ret_from_fork+0x35/0x40
[72379.366096] INFO: task zpool:23230 blocked for more than 120 seconds.
[72379.366101]       Tainted: P           OE    4.15.0-39-generic #42-Ubuntu
[72379.366104] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[72379.366108] zpool           D    0 23230  23223 0x00000004
[72379.366113] Call Trace:
[72379.366119]  __schedule+0x291/0x8a0
[72379.366126]  ? __wake_up_common+0x73/0x130
[72379.366130]  schedule+0x2c/0x80
[72379.366143]  cv_wait_common+0x11e/0x140 [spl]
[72379.366148]  ? wait_woken+0x80/0x80
[72379.366160]  __cv_wait+0x15/0x20 [spl]
[72379.366256]  txg_wait_synced+0xdd/0x130 [zfs]
[72379.366320]  spa_load+0x1aa9/0x2130 [zfs]
[72379.366378]  spa_load_best+0x5d/0x280 [zfs]
[72379.366387]  ? zpool_get_rewind_policy+0x18b/0x1a0 [zcommon]
[72379.366441]  spa_import+0x220/0x730 [zfs]
[72379.366506]  zfs_ioc_pool_import+0x13c/0x150 [zfs]
[72379.366589]  zfsdev_ioctl+0x1e0/0x610 [zfs]
[72379.366599]  do_vfs_ioctl+0xa8/0x630
[72379.366605]  ? handle_mm_fault+0xb1/0x1f0
[72379.366612]  ? __do_page_fault+0x270/0x4d0
[72379.366617]  ? do_sys_open+0x13e/0x2c0
[72379.366621]  SyS_ioctl+0x79/0x90
[72379.366628]  do_syscall_64+0x73/0x130
[72379.366633]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[72379.366636] RIP: 0033:0x7f27b96375d7
[72379.366638] RSP: 002b:00007ffc48f9ef78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[72379.366642] RAX: ffffffffffffffda RBX: 00007ffc48fa25b0 RCX: 00007f27b96375d7
[72379.366645] RDX: 00007ffc48fa25b0 RSI: 0000000000005a02 RDI: 0000000000000003
[72379.366647] RBP: 000055afd45ef660 R08: 0000000000000003 R09: 0000000000000000
[72379.366649] R10: 000055afd45ee010 R11: 0000000000000246 R12: 000055afd45f1750
[72379.366652] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[72379.366671] INFO: task txg_sync:24801 blocked for more than 120 seconds.
[72379.366677]       Tainted: P           OE    4.15.0-39-generic #42-Ubuntu
[72379.366681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[72379.366684] txg_sync        D    0 24801      2 0x80000000
[72379.366689] Call Trace:
[72379.366695]  __schedule+0x291/0x8a0
[72379.366700]  schedule+0x2c/0x80
[72379.366705]  io_schedule+0x16/0x40
[72379.366715]  cv_wait_common+0xb2/0x140 [spl]
[72379.366720]  ? wait_woken+0x80/0x80
[72379.366728]  __cv_wait_io+0x18/0x20 [spl]
[72379.366789]  zio_wait+0xf8/0x1b0 [zfs]
[72379.366843]  dsl_pool_sync+0x3e5/0x430 [zfs]
[72379.366910]  spa_sync+0x43e/0xd80 [zfs]
[72379.366976]  txg_sync_thread+0x2cd/0x4a0 [zfs]
[72379.366981]  ? __switch_to_asm+0x40/0x70
[72379.367052]  ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
[72379.367060]  thread_generic_wrapper+0x74/0x90 [spl]
[72379.367066]  kthread+0x121/0x140
[72379.367073]  ? __thread_exit+0x20/0x20 [spl]
[72379.367077]  ? kthread_create_worker_on_cpu+0x70/0x70
[72379.367081]  ? kthread_create_worker_on_cpu+0x70/0x70
[72379.367086]  ret_from_fork+0x35/0x40
@rejsmont rejsmont changed the title ZFS root boot hangs at kernel loading zpool import deadlock with 0.7.5 (imports fine with 0.6.5.6 in ro mode) Nov 8, 2018
@rejsmont
Copy link
Author

rejsmont commented Nov 8, 2018

Hi!

I did a bit more digging into the matter (as I also wanted to save my data) and found that I may be after an actual bug in 0.7.

@kpande Please see the edited bug report.

@rincebrain
Copy link
Contributor

Related to #7659 maybe?

You could also try import with the l2arc device not connected, and since persistent L2ARC hasn't landed, nothing of value will be lost. If that works, and you can reproduce it by re-adding the l2arc device, that'd be exciting. If you don't have one, I suppose that'd be even more exciting.

@rejsmont
Copy link
Author

rejsmont commented Nov 8, 2018

@rincebrain Hi! Thanks for the suggestion. Unfortunately, it's a single disk pool, so no separate l2arc device (it was originally running under vm as lxd storage backend, so made no sense to spread it over multiple volumes). I will zfs send/receive to a fresh pool using 0.6.5.6 and see if it helps.

EDIT: And yes, I do not have an l2arc device, so I guess that could be exciting.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@rincebrain @rejsmont and others