-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel hang when removing faulted log device #1585
Comments
It appears that trying to remove the faulted log device causes zfs to try and read from it. This read basically hangs while holding an important lock which causes everything to block. Thanks for the bug report and stacks so we can fix this case. |
Right, that explains why, after rebooting, I could remove the log device without problems. |
@behlendorf Could this still be a problem? |
@FransUrbo I suspect this is still and issue, let's treat it as such until we know differently. |
Just ran into a lockup removing half a mirror of slog (2 partitions, one per SSD, in a mirror), so yeah, still happening.
|
Closing as stale. If anyone still hitting this let us know and we'll reopen it. |
A log device (the only one) on my pool faulted (due to too many I/O errors), after which a background task that was copying data just continued copying.
However, when trying to zpool remove the log device from the pool, all activity stopped, and I was hit with this:
[436202.280203] INFO: task l2arc_feed:1755 blocked for more than 120 seconds.
[436202.298763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436202.312515] l2arc_feed D ffff88020fc13f40 0 1755 2 0x00000000
[436202.312524] ffff8801f0683d28 0000000000000046 ffff8801de07ae80 ffff8801f0683fd8
[436202.312531] ffff8801f0683fd8 ffff8801f0683fd8 ffff8801de07c5c0 ffff8801de07ae80
[436202.312535] ffffffffa045d5a0 ffff8801de07ae80 ffffffffa045d5a4 00000000ffffffff
[436202.312542] Call Trace:
[436202.312559] [] schedule+0x29/0x70
[436202.312575] [] schedule_preempt_disabled+0xe/0x10
[436202.312581] [] __mutex_lock_slowpath+0xd7/0x150
[436202.312588] [] mutex_lock+0x2a/0x50
[436202.312616] [] l2arc_feed_thread+0xaa/0x850 [zfs]
[436202.312646] [] ? l2arc_evict+0x300/0x300 [zfs]
[436202.312663] [] thread_generic_wrapper+0x78/0x90 [spl]
[436202.312674] [] ? __thread_create+0x340/0x340 [spl]
[436202.312680] [] kthread+0xc0/0xd0
[436202.312686] [] ? kthread_create_on_node+0x120/0x120
[436202.312693] [] ret_from_fork+0x7c/0xb0
[436202.312698] [] ? kthread_create_on_node+0x120/0x120
[436202.312716] INFO: task txg_quiesce:2190 blocked for more than 120 seconds.
[436202.320975] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436202.327643] txg_quiesce D ffff88020fc33f40 0 2190 2 0x00000000
[436202.327655] ffff8801d7c21d68 0000000000000046 ffff8801db45dd00 ffff8801d7c21fd8
[436202.327662] ffff8801d7c21fd8 ffff8801d7c21fd8 ffff880204149740 ffff8801db45dd00
[436202.327669] ffffc90001040258 ffffc900010401f0 ffffc90001040260 0000000000000000
[436202.327676] Call Trace:
[436202.327692] [] schedule+0x29/0x70
[436202.327717] [] cv_wait_common+0xfd/0x1b0 [spl]
[436202.327741] [] ? finish_wait+0x80/0x80
[436202.327766] [] __cv_wait+0x15/0x20 [spl]
[436202.327839] [] txg_quiesce_thread+0x223/0x380 [zfs]
[436202.327909] [] ? txg_fini+0x270/0x270 [zfs]
[436202.327927] [] thread_generic_wrapper+0x78/0x90 [spl]
[436202.327943] [] ? __thread_create+0x340/0x340 [spl]
[436202.327950] [] kthread+0xc0/0xd0
[436202.327957] [] ? kthread_create_on_node+0x120/0x120
[436202.327966] [] ret_from_fork+0x7c/0xb0
[436202.327972] [] ? kthread_create_on_node+0x120/0x120
[436202.328020] INFO: task cp:8781 blocked for more than 120 seconds.
[436202.333279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436202.339960] cp D ffff88020fc33f40 0 8781 8662 0x00000000
[436202.339970] ffff88012da35b98 0000000000000086 ffff8801d71a8000 ffff88012da35fd8
[436202.339978] ffff88012da35fd8 ffff88012da35fd8 ffff880204149740 ffff8801d71a8000
[436202.339992] ffff8801dbba6a90 ffff8801dbba69f8 ffff8801dbba6a98 0000000000000000
[436202.340000] Call Trace:
[436202.340013] [] schedule+0x29/0x70
[436202.340034] [] cv_wait_common+0xfd/0x1b0 [spl]
[436202.340041] [] ? finish_wait+0x80/0x80
[436202.340070] [] __cv_wait+0x15/0x20 [spl]
[436202.340157] [] txg_wait_synced+0xb3/0x190 [zfs]
[436202.340235] [] dmu_tx_wait+0xc5/0xf0 [zfs]
[436202.340302] [] zfs_write+0x356/0xc50 [zfs]
[436202.340311] [] ? file_read_actor+0xd5/0x180
[436202.340376] [] zpl_write_common+0x52/0x80 [zfs]
[436202.340436] [] zpl_write+0x6b/0xa0 [zfs]
[436202.340445] [] vfs_write+0xac/0x180
[436202.340452] [] sys_write+0x52/0xa0
[436202.340461] [] system_call_fastpath+0x1a/0x1f
[436202.340472] INFO: task zpool:13391 blocked for more than 120 seconds.
[436202.346043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436202.353515] zpool D ffff88020fc33f40 0 13391 13390 0x00000004
[436202.353520] ffff8801025ad718 0000000000000082 ffff88001ad42e80 ffff8801025adfd8
[436202.353524] ffff8801025adfd8 ffff8801025adfd8 ffff88001e581740 ffff88001ad42e80
[436202.353527] ffff88001ad42e80 ffff88020fc347f8 ffff8800c36c32e0 0000000000000001
[436202.353531] Call Trace:
[436202.353537] [] schedule+0x29/0x70
[436202.353554] [] io_schedule+0x8f/0xd0
[436202.353576] [] cv_wait_common+0xa8/0x1b0 [spl]
[436202.353583] [] ? finish_wait+0x80/0x80
[436202.353600] [] __cv_wait_io+0x18/0x20 [spl]
[436202.353677] [] zio_wait+0x103/0x1a0 [zfs]
[436202.353730] [] arc_read_nolock+0x511/0x820 [zfs]
[436202.353792] [] dsl_read_nolock+0x2a/0x30 [zfs]
[436202.353861] [] zil_parse+0x2c1/0x820 [zfs]
[436202.353902] [] ? dmu_buf_rele+0x2f/0x40 [zfs]
[436202.353963] [] ? zil_itxg_clean+0x160/0x160 [zfs]
[436202.354023] [] ? zil_claim_log_record+0x80/0x80 [zfs]
[436202.354038] [] ? trace_unlock_tcd+0x16/0x20 [spl]
[436202.354052] [] ? spl_debug_msg+0x411/0x840 [spl]
[436202.354112] [] zil_destroy_sync+0x26/0x30 [zfs]
[436202.354170] [] zil_destroy+0x191/0x220 [zfs]
[436202.354229] [] zil_suspend+0xb3/0xd0 [zfs]
[436202.354286] [] zil_vdev_offline+0x54/0x90 [zfs]
[436202.354338] [] findfunc+0x12/0x20 [zfs]
[436202.354379] [] dmu_objset_find_spa+0xf6/0x440 [zfs]
[436202.354408] [] ? dmu_objset_userspace_present+0x20/0x20 [zfs]
[436202.354436] [] dmu_objset_find_spa+0x191/0x440 [zfs]
[436202.354490] [] ? dmu_objset_userspace_present+0x20/0x20 [zfs]
[436202.354537] [] dmu_objset_find_spa+0x191/0x440 [zfs]
[436202.354582] [] ? dmu_objset_userspace_present+0x20/0x20 [zfs]
[436202.354610] [] dmu_objset_find+0x2d/0x30 [zfs]
[436202.354680] [] ? zil_replaying+0x90/0x90 [zfs]
[436202.354743] [] spa_offline_log+0x2c/0x50 [zfs]
[436202.354803] [] spa_vdev_remove+0x398/0x3c0 [zfs]
[436202.354866] [] zfs_ioc_vdev_remove+0x37/0x60 [zfs]
[436202.354927] [] zfsdev_ioctl+0xdd/0x1d0 [zfs]
[436202.354935] [] do_vfs_ioctl+0x99/0x570
[436202.354946] [] ? kmem_cache_free+0x2f/0x130
[436202.354952] [] ? final_putname+0x22/0x50
[436202.354959] [] ? putname+0x29/0x40
[436202.354965] [] sys_ioctl+0x91/0xb0
[436202.354973] [] ? do_page_fault+0xe/0x10
[436202.354982] [] system_call_fastpath+0x1a/0x1f
[436322.352212] INFO: task l2arc_feed:1755 blocked for more than 120 seconds.
[436322.364603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436322.377256] l2arc_feed D ffff88020fc13f40 0 1755 2 0x00000000
[436322.377269] ffff8801f0683d28 0000000000000046 ffff8801de07ae80 ffff8801f0683fd8
[436322.377277] ffff8801f0683fd8 ffff8801f0683fd8 ffff8801de07c5c0 ffff8801de07ae80
[436322.377284] ffffffffa045d5a0 ffff8801de07ae80 ffffffffa045d5a4 00000000ffffffff
[436322.377292] Call Trace:
[436322.377326] [] schedule+0x29/0x70
[436322.377336] [] schedule_preempt_disabled+0xe/0x10
[436322.377345] [] __mutex_lock_slowpath+0xd7/0x150
[436322.377354] [] mutex_lock+0x2a/0x50
[436322.377398] [] l2arc_feed_thread+0xaa/0x850 [zfs]
[436322.377443] [] ? l2arc_evict+0x300/0x300 [zfs]
[436322.377463] [] thread_generic_wrapper+0x78/0x90 [spl]
[436322.377479] [] ? __thread_create+0x340/0x340 [spl]
[436322.377487] [] kthread+0xc0/0xd0
[436322.377494] [] ? kthread_create_on_node+0x120/0x120
[436322.377504] [] ret_from_fork+0x7c/0xb0
[436322.377510] [] ? kthread_create_on_node+0x120/0x120
[436322.377535] INFO: task txg_quiesce:2190 blocked for more than 120 seconds.
[436322.390194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436322.402640] txg_quiesce D ffff88020fc33f40 0 2190 2 0x00000000
[436322.402645] ffff8801d7c21d68 0000000000000046 ffff8801db45dd00 ffff8801d7c21fd8
[436322.402661] ffff8801d7c21fd8 ffff8801d7c21fd8 ffff880204149740 ffff8801db45dd00
[436322.402667] ffffc90001040258 ffffc900010401f0 ffffc90001040260 0000000000000000
[436322.402687] Call Trace:
[436322.402698] [] schedule+0x29/0x70
[436322.402719] [] cv_wait_common+0xfd/0x1b0 [spl]
[436322.402727] [] ? finish_wait+0x80/0x80
[436322.402752] [] __cv_wait+0x15/0x20 [spl]
[436322.402823] [] txg_quiesce_thread+0x223/0x380 [zfs]
[436322.402893] [] ? txg_fini+0x270/0x270 [zfs]
[436322.402910] [] thread_generic_wrapper+0x78/0x90 [spl]
[436322.402926] [] ? __thread_create+0x340/0x340 [spl]
[436322.402932] [] kthread+0xc0/0xd0
[436322.402939] [] ? kthread_create_on_node+0x120/0x120
[436322.402948] [] ret_from_fork+0x7c/0xb0
[436322.402954] [] ? kthread_create_on_node+0x120/0x120
[436322.403000] INFO: task cp:8781 blocked for more than 120 seconds.
[436322.408333] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436322.415812] cp D ffff88020fc33f40 0 8781 8662 0x00000000
[436322.415828] ffff88012da35b98 0000000000000086 ffff8801d71a8000 ffff88012da35fd8
[436322.415835] ffff88012da35fd8 ffff88012da35fd8 ffff880204149740 ffff8801d71a8000
[436322.415842] ffff8801dbba6a90 ffff8801dbba69f8 ffff8801dbba6a98 0000000000000000
[436322.415860] Call Trace:
[436322.415872] [] schedule+0x29/0x70
[436322.415892] [] cv_wait_common+0xfd/0x1b0 [spl]
[436322.415899] [] ? finish_wait+0x80/0x80
[436322.415924] [] __cv_wait+0x15/0x20 [spl]
[436322.415991] [] txg_wait_synced+0xb3/0x190 [zfs]
[436322.416048] [] dmu_tx_wait+0xc5/0xf0 [zfs]
[436322.416158] [] zfs_write+0x356/0xc50 [zfs]
[436322.416172] [] ? file_read_actor+0xd5/0x180
[436322.416237] [] zpl_write_common+0x52/0x80 [zfs]
[436322.416302] [] zpl_write+0x6b/0xa0 [zfs]
[436322.416311] [] vfs_write+0xac/0x180
[436322.416318] [] sys_write+0x52/0xa0
[436322.416327] [] system_call_fastpath+0x1a/0x1f
[436322.416336] INFO: task zpool:13391 blocked for more than 120 seconds.
[436322.422022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436322.429757] zpool D ffff88020fc33f40 0 13391 13390 0x00000004
[436322.429761] ffff8801025ad718 0000000000000082 ffff88001ad42e80 ffff8801025adfd8
[436322.429765] ffff8801025adfd8 ffff8801025adfd8 ffff88001e581740 ffff88001ad42e80
[436322.429782] ffff88001ad42e80 ffff88020fc347f8 ffff8800c36c32e0 0000000000000001
[436322.429788] Call Trace:
[436322.429799] [] schedule+0x29/0x70
[436322.429823] [] io_schedule+0x8f/0xd0
[436322.429843] [] cv_wait_common+0xa8/0x1b0 [spl]
[436322.429850] [] ? finish_wait+0x80/0x80
[436322.429866] [] __cv_wait_io+0x18/0x20 [spl]
[436322.429937] [] zio_wait+0x103/0x1a0 [zfs]
[436322.429976] [] arc_read_nolock+0x511/0x820 [zfs]
[436322.430037] [] dsl_read_nolock+0x2a/0x30 [zfs]
[436322.430100] [] zil_parse+0x2c1/0x820 [zfs]
[436322.430140] [] ? dmu_buf_rele+0x2f/0x40 [zfs]
[436322.430200] [] ? zil_itxg_clean+0x160/0x160 [zfs]
[436322.430259] [] ? zil_claim_log_record+0x80/0x80 [zfs]
[436322.430273] [] ? trace_unlock_tcd+0x16/0x20 [spl]
[436322.430287] [] ? spl_debug_msg+0x411/0x840 [spl]
[436322.430347] [] zil_destroy_sync+0x26/0x30 [zfs]
[436322.430404] [] zil_destroy+0x191/0x220 [zfs]
[436322.430463] [] zil_suspend+0xb3/0xd0 [zfs]
[436322.430520] [] zil_vdev_offline+0x54/0x90 [zfs]
[436322.430567] [] findfunc+0x12/0x20 [zfs]
[436322.430613] [] dmu_objset_find_spa+0xf6/0x440 [zfs]
[436322.430659] [] ? dmu_objset_userspace_present+0x20/0x20 [zfs]
[436322.430705] [] dmu_objset_find_spa+0x191/0x440 [zfs]
[436322.430749] [] ? dmu_objset_userspace_present+0x20/0x20 [zfs]
[436322.430802] [] dmu_objset_find_spa+0x191/0x440 [zfs]
[436322.430847] [] ? dmu_objset_userspace_present+0x20/0x20 [zfs]
[436322.430892] [] dmu_objset_find+0x2d/0x30 [zfs]
[436322.430951] [] ? zil_replaying+0x90/0x90 [zfs]
[436322.431013] [] spa_offline_log+0x2c/0x50 [zfs]
[436322.431073] [] spa_vdev_remove+0x398/0x3c0 [zfs]
[436322.431136] [] zfs_ioc_vdev_remove+0x37/0x60 [zfs]
[436322.431196] [] zfsdev_ioctl+0xdd/0x1d0 [zfs]
[436322.431205] [] do_vfs_ioctl+0x99/0x570
[436322.431214] [] ? kmem_cache_free+0x2f/0x130
[436322.431221] [] ? final_putname+0x22/0x50
[436322.431227] [] ? putname+0x29/0x40
[436322.431234] [] sys_ioctl+0x91/0xb0
[436322.431242] [] ? do_page_fault+0xe/0x10
[436322.431251] [] system_call_fastpath+0x1a/0x1f
[436442.428213] INFO: task l2arc_feed:1755 blocked for more than 120 seconds.
[436442.439100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436442.451505] l2arc_feed D ffff88020fc13f40 0 1755 2 0x00000000
[436442.451516] ffff8801f0683d28 0000000000000046 ffff8801de07ae80 ffff8801f0683fd8
[436442.451524] ffff8801f0683fd8 ffff8801f0683fd8 ffff8801de07c5c0 ffff8801de07ae80
[436442.451530] ffffffffa045d5a0 ffff8801de07ae80 ffffffffa045d5a4 00000000ffffffff
[436442.451538] Call Trace:
[436442.451571] [] schedule+0x29/0x70
[436442.451581] [] schedule_preempt_disabled+0xe/0x10
[436442.451590] [] __mutex_lock_slowpath+0xd7/0x150
[436442.451599] [] mutex_lock+0x2a/0x50
[436442.451649] [] l2arc_feed_thread+0xaa/0x850 [zfs]
[436442.451694] [] ? l2arc_evict+0x300/0x300 [zfs]
[436442.451714] [] thread_generic_wrapper+0x78/0x90 [spl]
[436442.451730] [] ? __thread_create+0x340/0x340 [spl]
[436442.451737] [] kthread+0xc0/0xd0
[436442.451744] [] ? kthread_create_on_node+0x120/0x120
[436442.451755] [] ret_from_fork+0x7c/0xb0
[436442.451761] [] ? kthread_create_on_node+0x120/0x120
[436442.451786] INFO: task txg_quiesce:2190 blocked for more than 120 seconds.
[436442.465009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[436442.472455] txg_quiesce D ffff88020fc33f40 0 2190 2 0x00000000
[436442.472459] ffff8801d7c21d68 0000000000000046 ffff8801db45dd00 ffff8801d7c21fd8
[436442.472463] ffff8801d7c21fd8 ffff8801d7c21fd8 ffff880204149740 ffff8801db45dd00
[436442.472481] ffffc90001040258 ffffc900010401f0 ffffc90001040260 0000000000000000
[436442.472487] Call Trace:
[436442.472499] [] schedule+0x29/0x70
[436442.472534] [] cv_wait_common+0xfd/0x1b0 [spl]
[436442.472542] [] ? finish_wait+0x80/0x80
[436442.472558] [] __cv_wait+0x15/0x20 [spl]
[436442.472630] [] txg_quiesce_thread+0x223/0x380 [zfs]
[436442.472701] [] ? txg_fini+0x270/0x270 [zfs]
[436442.472719] [] thread_generic_wrapper+0x78/0x90 [spl]
[436442.472734] [] ? __thread_create+0x340/0x340 [spl]
[436442.472740] [] kthread+0xc0/0xd0
[436442.472747] [] ? kthread_create_on_node+0x120/0x120
[436442.472756] [] ret_from_fork+0x7c/0xb0
[436442.472763] [] ? kthread_create_on_node+0x120/0x120
This is the -daily PPA package on Linux 3.8 / Ubuntu 13.04 server:
Linux wutra 3.8.0-26-generic #38-Ubuntu SMP Mon Jun 17 21:43:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: