Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool over a ramdisk causes oops inside brd.c #1107

Closed
cwedgwood opened this issue Nov 24, 2012 · 5 comments
Closed

zpool over a ramdisk causes oops inside brd.c #1107

cwedgwood opened this issue Nov 24, 2012 · 5 comments
Milestone

Comments

@cwedgwood
Copy link
Contributor

3.6.x

when using a zpool over a ramdisk it triggers an oops inside brd.c

making the pool

modprobe brd rd_nr=4 rd_size=$((1024*1024*2))
# don't know if these are necessary
zfs set sync=disabled test0
zfs set dedup=on test0

make zvol(s)

# create zvol(s) for dedup testing
zfs -p -s create -V8G -o volblocksize=4K test0/4k/t1
zfs -p -s create -V8G -o volblocksize=4K test0/4k/t2

dd data into the zvols and after a few seconds:

[  819.017484] kernel BUG at linux/linux/drivers/block/brd.c:76!
[  819.024324] invalid opcode: 0000 [#1] SMP 
[  819.028522] Modules linked in: brd loop ipmi_si ipmi_devintf ipmi_msghandler tcm_loop iscsi_target_mod target_core_mod configfs sg zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ghash_clmulni_intel lpc_ich mfd_core microcode i7core_edac mpt2sas
[  819.058322] CPU 15 
[  819.060259] Pid: 11783, comm: z_wr_iss/15 Tainted: G           O 3.6.6.cw2 #1 Supermicro X8DTL/X8DTL
[  819.069684] RIP: 0010:[<ffffffffa002f1a3>]  [<ffffffffa002f1a3>] brd_lookup_page+0x2a/0x30 [brd]
[  819.078650] RSP: 0018:ffff88062291f9d0  EFLAGS: 00010293
[  819.084049] RAX: ffffea001705b680 RBX: 00000000000008f5 RCX: 00000000fffffffa
[  819.091260] RDX: 0000000000000000 RSI: 00000000000008f5 RDI: 0000000000000000
[  819.098457] RBP: ffff88062291f9e0 R08: ffff8806023bd868 R09: 0000000000000037
[  819.105817] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88063124b278
[  819.113065] R13: ffff88032af6e980 R14: ffff88063124b200 R15: 0000000000000400
[  819.120253] FS:  0000000000000000(0000) GS:ffff88063fce0000(0000) knlGS:0000000000000000
[  819.128516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  819.134385] CR2: 00007f31bed8b9de CR3: 0000000001779000 CR4: 00000000000007e0
[  819.141676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  819.148923] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  819.156143] Process z_wr_iss/15 (pid: 11783, threadinfo ffff88062291c000, task ffff880622440000)
[  819.165102] Stack:
[  819.167146]  0000000000000000 00000000000047aa ffff88062291fa50 ffffffffa002f5b4
[  819.174745]  000000002291fa00 0000000000000000 0000000100000400 ffffea000cabd8c0
[  819.182309]  0000000000000400 0001121000000000 0000000000008030 ffff88063124b200
[  819.190013] Call Trace:
[  819.192505]  [<ffffffffa002f5b4>] brd_make_request+0x125/0x32d [brd]
[  819.198968]  [<ffffffff81243daa>] generic_make_request+0x9f/0xe0
[  819.205068]  [<ffffffff812447de>] submit_bio+0xbb/0xd4
[  819.210372]  [<ffffffffa09b0c41>] __vdev_disk_physio+0x26a/0x2d3 [zfs]
[  819.217123]  [<ffffffffa09b0d69>] vdev_disk_io_start+0xbf/0xd7 [zfs]
[  819.223710]  [<ffffffffa09e30a5>] zio_vdev_io_start+0x211/0x223 [zfs]
[  819.230249]  [<ffffffffa09e1e3a>] zio_nowait+0xdc/0x103 [zfs]
[  819.236181]  [<ffffffffa09b37f3>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[  819.243318]  [<ffffffffa09b3170>] ? vdev_config_sync+0x134/0x134 [zfs]
[  819.250300]  [<ffffffffa09a7e71>] ? spa_config_enter+0xc4/0xde [zfs]
[  819.257040]  [<ffffffffa09e2ee4>] zio_vdev_io_start+0x50/0x223 [zfs]
[  819.263764]  [<ffffffffa09e1e3a>] zio_nowait+0xdc/0x103 [zfs]
[  819.269849]  [<ffffffffa09e280a>] zio_ddt_write+0x41b/0x43f [zfs]
[  819.276305]  [<ffffffffa09e0aa4>] ? zio_ddt_child_write_done+0x92/0x92 [zfs]
[  819.283692]  [<ffffffffa09e0a12>] ? zio_walk_parents+0x66/0x66 [zfs]
[  819.290367]  [<ffffffffa09dff29>] zio_execute+0xb8/0xe0 [zfs]
[  819.296405]  [<ffffffffa08c759b>] taskq_thread+0x2aa/0x4d2 [spl]
[  819.302665]  [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[  819.308952]  [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[  819.315043]  [<ffffffffa08c72f1>] ? task_done+0x110/0x110 [spl]
[  819.321232]  [<ffffffff81059601>] kthread+0x6f/0x77
[  819.326391]  [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[  819.332562]  [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[  819.339004]  [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[  819.346284]  [<ffffffff81501980>] ? gs_change+0x13/0x13
[  819.351813] Code: c3 55 48 89 e5 53 41 50 66 66 66 66 90 48 89 f3 48 83 c7 30 48 c1 eb 03 48 89 de e8 b1 e1 22 e1 48 85 c0 74 08 48 39 58 10 74 02 <0f> 0b 59 5b 5d c3 55 48 89 e5 41 56 41 55 41 54 53 66 66 66 66 
[  819.372907] RIP  [<ffffffffa002f1a3>] brd_lookup_page+0x2a/0x30 [brd]
[  819.379682]  RSP <ffff88062291f9d0>
[  819.383671] ---[ end trace 56c57f3a64767a11 ]---
@cwedgwood
Copy link
Contributor Author

@behlendorf the oops happens in line 76 of:

    51  /*
    52   * Look up and return a brd's page for a given sector.
    53   */
    54  static DEFINE_MUTEX(brd_mutex);
    55  static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
    56  {
    57          pgoff_t idx;
    58          struct page *page;
    59  
    60          /*
    61           * The page lifetime is protected by the fact that we have opened the
    62           * device node -- brd pages will never be deleted under us, so we
    63           * don't need any further locking or refcounting.
    64           *
    65           * This is strictly true for the radix-tree nodes as well (ie. we
    66           * don't actually need the rcu_read_lock()), however that is not a
    67           * documented feature of the radix-tree API so it is better to be
    68           * safe here (we don't have total exclusion from radix tree updates
    69           * here, only deletes).
    70           */
    71          rcu_read_lock();
    72          idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */
    73          page = radix_tree_lookup(&brd->brd_pages, idx);
    74          rcu_read_unlock();
    75  
 -> 76          BUG_ON(page && page->index != idx);
    77  
    78          return page;
    79  }

@behlendorf
Copy link
Contributor

This issue was caused by a long standing bug in the kernel's brd driver. ZFS just happens to trigger it fairly reliably due to the large number of concurrent access to the device from the z_wr_* threads. I'll submit the fix upstream, in the meanwhile the following fix closes the race and resolves the problem.

http://bpaste.net/show/97497/

@behlendorf
Copy link
Contributor

Posted upstream to LKML.

https://lkml.org/lkml/2013/5/10/424

@cwedgwood
Copy link
Contributor Author

Fixed by @behlendorf upstream.

@behlendorf
Copy link
Contributor

@cwedgwood Thanks for closing these out, I'd completely forgotten about some of these things which were fixed.

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
The previous patch openzfs#14841 appeared to have significant flaw, causing
deadlocks if zl_get_data callback got blocked waiting for TXG sync.  I
already handled some of such cases in the original patch, but issue
 openzfs#14982 shown cases that were impossible to solve in that design.

This patch fixes the problem by postponing log blocks allocation till
the very end, just before the zios issue, leaving nothing blocking after
that point to cause deadlocks.  Before that point though any sleeps are
now allowed, not causing sync thread blockage.  This require slightly
more complicated lwb state machine to allocate blocks and issue zios
in proper order.  But with removal of special early issue workarounds
the new code is much cleaner now, and should even be more efficient.

Since this patch uses null zios between write, I've found that null
zios do not wait for logical children ready status in zio_ready(),
that makes parent write to proceed prematurely, producing incorrect
log blocks.  Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children()
fixes it.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.

Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants