Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This is a false positive. The full lockdep warning is below. The jist of it is: the existing dependency chain (in reverse order) is: -> #3 (&dp->dp_lock){+.+.}: dsl_pool_dirty_space+0x70/0x2d0 [zfs] dbuf_dirty+0x778/0x31d0 [zfs] -> #2 (&db->db_mtx){+.+.}: dmu_buf_will_dirty+0x6b/0x6c0 [zfs] bpobj_iterate_impl+0xbe6/0x1410 [zfs] spa_sync+0x1756/0x29b0 [zfs] txg_sync_thread+0x6d6/0x1370 [zfs] -> #1 (&bpo->bpo_lock){+.+.}: bpobj_space+0x63/0x470 [zfs] dsl_scan_active+0x340/0x3d0 [zfs] txg_sync_thread+0x3f2/0x1370 [zfs] -> #0 (&tx->tx_sync_lock){+.+.}: __mutex_lock+0xef/0x1380 txg_kick+0x61/0x420 [zfs] dsl_pool_need_dirty_delay+0x1c7/0x3f0 [zfs] Different instances of txg_sync_thread manage different dsl_pools, which contain different bpo_lock instances. Stacks #1 and #2 therefore can't be referring to the same bpo_lock instance, so this can't be an actual loop. The patch adds an ID field (bpo_id) to bpo_lock which is unique to each instance. This bpo_id is passed as the class to the mutex_enter_nested call in bpobj_space. This effectively makes this locking call invisible to lockdep. [ 125.297679] ====================================================== [ 125.301361] hardirqs last enabled at (28216159): [<ffffffff885cef12>] handle_mm_fault+0x582/0x7d0 [ 125.302863] WARNING: possible circular locking dependency detected [ 125.302865] 4.19.29-4.19.0-debug-99d9c44b25c08f51 #1 Tainted: G O [ 125.309056] hardirqs last disabled at (28216160): [<ffffffff88530f22>] get_page_from_freelist+0x3b2/0x31a0 [ 125.318009] ------------------------------------------------------ [ 125.318011] mdtagent/13103 is trying to acquire lock: [ 125.324203] softirqs last enabled at (28215144): [<ffffffff89c00631>] __do_softirq+0x631/0x868 [ 125.331591] 00000000679a2c2b ( [ 125.341249] softirqs last disabled at (28215139): [<ffffffff8813ef80>] irq_exit+0x150/0x170 [ 125.347425] &tx->tx_sync_lock){+.+.}, at: txg_kick+0x61/0x420 [zfs] [ 125.378920] but task is already holding lock: [ 125.384762] 00000000ec7e955e (&dp->dp_lock){+.+.}, at: dsl_pool_need_dirty_delay+0x8b/0x3f0 [zfs] [ 125.393679] which lock already depends on the new lock. [ 125.401861] the existing dependency chain (in reverse order) is: [ 125.409347] -> #3 (&dp->dp_lock){+.+.}: [ 125.414710] dsl_pool_dirty_space+0x70/0x2d0 [zfs] [ 125.420068] dbuf_dirty+0x778/0x31d0 [zfs] [ 125.424723] dmu_write_uio_dnode+0xfe/0x300 [zfs] [ 125.429990] dmu_write_uio_dbuf+0xa0/0x100 [zfs] [ 125.435174] zfs_write+0x18bd/0x1f70 [zfs] [ 125.439837] zpl_write_common_iovec+0x15e/0x380 [zfs] [ 125.445452] zpl_iter_write+0x1c6/0x2a0 [zfs] [ 125.450334] __vfs_write+0x3a2/0x5b0 [ 125.454440] vfs_write+0x15d/0x460 [ 125.458365] ksys_write+0xb1/0x170 [ 125.462295] do_syscall_64+0x9b/0x400 [ 125.466491] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 125.472069] -> #2 (&db->db_mtx){+.+.}: [ 125.477337] dmu_buf_will_dirty+0x6b/0x6c0 [zfs] [ 125.482509] bpobj_iterate_impl+0xbe6/0x1410 [zfs] [ 125.487863] spa_sync+0x1756/0x29b0 [zfs] [ 125.492442] txg_sync_thread+0x6d6/0x1370 [zfs] [ 125.497499] thread_generic_wrapper+0x1ff/0x2a0 [spl] [ 125.503080] kthread+0x2e7/0x3e0 [ 125.506839] ret_from_fork+0x24/0x50 [ 125.510937] -> #1 (&bpo->bpo_lock){+.+.}: [ 125.516464] bpobj_space+0x63/0x470 [zfs] [ 125.521038] dsl_scan_active+0x340/0x3d0 [zfs] [ 125.526049] txg_sync_thread+0x3f2/0x1370 [zfs] [ 125.531106] thread_generic_wrapper+0x1ff/0x2a0 [spl] [ 125.536684] kthread+0x2e7/0x3e0 [ 125.540439] ret_from_fork+0x24/0x50 [ 125.544535] -> #0 (&tx->tx_sync_lock){+.+.}: [ 125.550295] __mutex_lock+0xef/0x1380 [ 125.554529] txg_kick+0x61/0x420 [zfs] [ 125.558841] dsl_pool_need_dirty_delay+0x1c7/0x3f0 [zfs] [ 125.564715] dmu_tx_assign+0x13b/0xe40 [zfs] [ 125.569557] zfs_write+0xb5e/0x1f70 [zfs] [ 125.574130] zpl_write_common_iovec+0x15e/0x380 [zfs] [ 125.579746] zpl_iter_write+0x1c6/0x2a0 [zfs] [ 125.584628] __vfs_write+0x3a2/0x5b0 [ 125.588726] vfs_write+0x15d/0x460 [ 125.592654] ksys_write+0xb1/0x170 [ 125.596580] do_int80_syscall_32+0xf1/0x460 [ 125.601287] entry_INT80_compat+0x8a/0xa0 [ 125.605817] other info that might help us debug this: [ 125.613825] Chain exists of: &tx->tx_sync_lock --> &db->db_mtx --> &dp->dp_lock [ 125.624174] Possible unsafe locking scenario: [ 125.630099] CPU0 CPU1 [ 125.634633] ---- ---- [ 125.639166] lock(&dp->dp_lock); [ 125.642485] lock(&db->db_mtx); [ 125.648232] lock(&dp->dp_lock); [ 125.654071] lock(&tx->tx_sync_lock); [ 125.657823] *** DEADLOCK *** [ 125.663743] 3 locks held by mdtagent/13103: [ 125.667928] #0: 0000000016b80a9c (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xb1/0xe0 [ 125.675591] #1: 00000000d0227133 (sb_writers#12){.+.+}, at: vfs_write+0x332/0x460 [ 125.683167] #2: 00000000ec7e955e (&dp->dp_lock){+.+.}, at: dsl_pool_need_dirty_delay+0x8b/0x3f0 [zfs] [ 125.692517] stack backtrace: [ 125.696878] CPU: 28 PID: 13103 Comm: mdtagent Tainted: G O 4.19.29-4.19.0-debug-99d9c44b25c08f51 #1 [ 125.707226] Hardware name: Ciara Technologies 1x16-X7p-L-team/X7P-MB, BIOS P2.12 03/05/2019 [ 125.715578] Call Trace: [ 125.718038] dump_stack+0x91/0xeb [ 125.721364] print_circular_bug.isra.16+0x30b/0x5b0 [ 125.726249] ? save_trace+0xd6/0x240 [ 125.729827] __lock_acquire+0x41be/0x4f10 [ 125.733841] ? debug_show_all_locks+0x2d0/0x2d0 [ 125.738373] ? debug_show_all_locks+0x2d0/0x2d0 [ 125.742907] ? kernel_text_address+0x6d/0x100 [ 125.747276] ? __save_stack_trace+0x73/0xd0 [ 125.751468] ? quarantine_put+0xb0/0x150 [ 125.755396] ? lock_acquire+0x153/0x330 [ 125.759242] lock_acquire+0x153/0x330 [ 125.762966] ? txg_kick+0x61/0x420 [zfs] [ 125.766896] ? lock_contended+0xd60/0xd60 [ 125.770958] ? txg_kick+0x61/0x420 [zfs] [ 125.774885] __mutex_lock+0xef/0x1380 [ 125.778603] ? txg_kick+0x61/0x420 [zfs] [ 125.782572] ? txg_kick+0x61/0x420 [zfs] [ 125.786500] ? __mutex_add_waiter+0x160/0x160 [ 125.790866] ? sched_clock+0x5/0x10 [ 125.794359] ? __mutex_add_waiter+0x160/0x160 [ 125.798721] ? __cv_destroy+0x9e/0x180 [spl] [ 125.803002] ? cv_destroy_wakeup+0xc0/0xc0 [spl] [ 125.807628] ? __mutex_unlock_slowpath+0xf3/0x660 [ 125.812384] ? zio_wait+0x419/0x660 [zfs] [ 125.816442] ? txg_kick+0x61/0x420 [zfs] [ 125.820412] txg_kick+0x61/0x420 [zfs] [ 125.824209] dsl_pool_need_dirty_delay+0x1c7/0x3f0 [zfs] [ 125.829563] dmu_tx_assign+0x13b/0xe40 [zfs] [ 125.833872] ? dmu_tx_count_write+0x42f/0x700 [zfs] [ 125.838800] zfs_write+0xb5e/0x1f70 [zfs] [ 125.842816] ? sched_clock+0x5/0x10 [ 125.846358] ? zfs_close+0x1e0/0x1e0 [zfs] [ 125.850457] ? sched_clock+0x5/0x10 [ 125.853951] ? sched_clock_cpu+0x18/0x170 [ 125.857971] ? check_flags.part.23+0x480/0x480 [ 125.862418] ? __lock_acquire+0xe3b/0x4f10 [ 125.866526] ? __fget+0x2d6/0x3e0 [ 125.869852] ? sched_clock+0x5/0x10 [ 125.873345] ? sched_clock_cpu+0x18/0x170 [ 125.877402] zpl_write_common_iovec+0x15e/0x380 [zfs] [ 125.882453] ? __lock_acquire+0xe3b/0x4f10 [ 125.886596] ? zpl_read_common_iovec+0x2b0/0x2b0 [zfs] [ 125.891779] zpl_iter_write+0x1c6/0x2a0 [zfs] [ 125.896139] __vfs_write+0x3a2/0x5b0 [ 125.899717] ? __fdget_pos+0xb1/0xe0 [ 125.903297] ? kernel_read+0x130/0x130 [ 125.907049] ? __mutex_add_waiter+0x160/0x160 [ 125.911407] ? lock_acquire+0x153/0x330 [ 125.915248] ? lock_acquire+0x153/0x330 [ 125.919089] ? rcu_read_lock_sched_held+0xeb/0x120 [ 125.923887] ? rcu_sync_lockdep_assert+0x76/0xb0 [ 125.928509] ? __sb_start_write+0xf0/0x260 [ 125.932614] vfs_write+0x15d/0x460 [ 125.936022] ksys_write+0xb1/0x170 [ 125.939425] ? __ia32_sys_read+0xb0/0xb0 [ 125.943353] ? poll_select_set_timeout+0xc0/0xc0 [ 125.947974] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 125.952601] ? do_int80_syscall_32+0x17/0x460 [ 125.956960] do_int80_syscall_32+0xf1/0x460 [ 125.961145] entry_INT80_compat+0x8a/0xa0
- Loading branch information