zpool over a ramdisk causes oops inside brd.c #1107

cwedgwood · 2012-11-24T02:11:10Z

3.6.x

when using a zpool over a ramdisk it triggers an oops inside brd.c

making the pool

modprobe brd rd_nr=4 rd_size=$((1024*1024*2))
# don't know if these are necessary
zfs set sync=disabled test0
zfs set dedup=on test0

make zvol(s)

# create zvol(s) for dedup testing
zfs -p -s create -V8G -o volblocksize=4K test0/4k/t1
zfs -p -s create -V8G -o volblocksize=4K test0/4k/t2

dd data into the zvols and after a few seconds:

[  819.017484] kernel BUG at linux/linux/drivers/block/brd.c:76!
[  819.024324] invalid opcode: 0000 [#1] SMP 
[  819.028522] Modules linked in: brd loop ipmi_si ipmi_devintf ipmi_msghandler tcm_loop iscsi_target_mod target_core_mod configfs sg zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ghash_clmulni_intel lpc_ich mfd_core microcode i7core_edac mpt2sas
[  819.058322] CPU 15 
[  819.060259] Pid: 11783, comm: z_wr_iss/15 Tainted: G           O 3.6.6.cw2 #1 Supermicro X8DTL/X8DTL
[  819.069684] RIP: 0010:[<ffffffffa002f1a3>]  [<ffffffffa002f1a3>] brd_lookup_page+0x2a/0x30 [brd]
[  819.078650] RSP: 0018:ffff88062291f9d0  EFLAGS: 00010293
[  819.084049] RAX: ffffea001705b680 RBX: 00000000000008f5 RCX: 00000000fffffffa
[  819.091260] RDX: 0000000000000000 RSI: 00000000000008f5 RDI: 0000000000000000
[  819.098457] RBP: ffff88062291f9e0 R08: ffff8806023bd868 R09: 0000000000000037
[  819.105817] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88063124b278
[  819.113065] R13: ffff88032af6e980 R14: ffff88063124b200 R15: 0000000000000400
[  819.120253] FS:  0000000000000000(0000) GS:ffff88063fce0000(0000) knlGS:0000000000000000
[  819.128516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  819.134385] CR2: 00007f31bed8b9de CR3: 0000000001779000 CR4: 00000000000007e0
[  819.141676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  819.148923] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  819.156143] Process z_wr_iss/15 (pid: 11783, threadinfo ffff88062291c000, task ffff880622440000)
[  819.165102] Stack:
[  819.167146]  0000000000000000 00000000000047aa ffff88062291fa50 ffffffffa002f5b4
[  819.174745]  000000002291fa00 0000000000000000 0000000100000400 ffffea000cabd8c0
[  819.182309]  0000000000000400 0001121000000000 0000000000008030 ffff88063124b200
[  819.190013] Call Trace:
[  819.192505]  [<ffffffffa002f5b4>] brd_make_request+0x125/0x32d [brd]
[  819.198968]  [<ffffffff81243daa>] generic_make_request+0x9f/0xe0
[  819.205068]  [<ffffffff812447de>] submit_bio+0xbb/0xd4
[  819.210372]  [<ffffffffa09b0c41>] __vdev_disk_physio+0x26a/0x2d3 [zfs]
[  819.217123]  [<ffffffffa09b0d69>] vdev_disk_io_start+0xbf/0xd7 [zfs]
[  819.223710]  [<ffffffffa09e30a5>] zio_vdev_io_start+0x211/0x223 [zfs]
[  819.230249]  [<ffffffffa09e1e3a>] zio_nowait+0xdc/0x103 [zfs]
[  819.236181]  [<ffffffffa09b37f3>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[  819.243318]  [<ffffffffa09b3170>] ? vdev_config_sync+0x134/0x134 [zfs]
[  819.250300]  [<ffffffffa09a7e71>] ? spa_config_enter+0xc4/0xde [zfs]
[  819.257040]  [<ffffffffa09e2ee4>] zio_vdev_io_start+0x50/0x223 [zfs]
[  819.263764]  [<ffffffffa09e1e3a>] zio_nowait+0xdc/0x103 [zfs]
[  819.269849]  [<ffffffffa09e280a>] zio_ddt_write+0x41b/0x43f [zfs]
[  819.276305]  [<ffffffffa09e0aa4>] ? zio_ddt_child_write_done+0x92/0x92 [zfs]
[  819.283692]  [<ffffffffa09e0a12>] ? zio_walk_parents+0x66/0x66 [zfs]
[  819.290367]  [<ffffffffa09dff29>] zio_execute+0xb8/0xe0 [zfs]
[  819.296405]  [<ffffffffa08c759b>] taskq_thread+0x2aa/0x4d2 [spl]
[  819.302665]  [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[  819.308952]  [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[  819.315043]  [<ffffffffa08c72f1>] ? task_done+0x110/0x110 [spl]
[  819.321232]  [<ffffffff81059601>] kthread+0x6f/0x77
[  819.326391]  [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[  819.332562]  [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[  819.339004]  [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[  819.346284]  [<ffffffff81501980>] ? gs_change+0x13/0x13
[  819.351813] Code: c3 55 48 89 e5 53 41 50 66 66 66 66 90 48 89 f3 48 83 c7 30 48 c1 eb 03 48 89 de e8 b1 e1 22 e1 48 85 c0 74 08 48 39 58 10 74 02 <0f> 0b 59 5b 5d c3 55 48 89 e5 41 56 41 55 41 54 53 66 66 66 66 
[  819.372907] RIP  [<ffffffffa002f1a3>] brd_lookup_page+0x2a/0x30 [brd]
[  819.379682]  RSP <ffff88062291f9d0>
[  819.383671] ---[ end trace 56c57f3a64767a11 ]---

The text was updated successfully, but these errors were encountered:

cwedgwood · 2012-11-24T02:13:21Z

@behlendorf the oops happens in line 76 of:

    51  /*
    52   * Look up and return a brd's page for a given sector.
    53   */
    54  static DEFINE_MUTEX(brd_mutex);
    55  static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
    56  {
    57          pgoff_t idx;
    58          struct page *page;
    59  
    60          /*
    61           * The page lifetime is protected by the fact that we have opened the
    62           * device node -- brd pages will never be deleted under us, so we
    63           * don't need any further locking or refcounting.
    64           *
    65           * This is strictly true for the radix-tree nodes as well (ie. we
    66           * don't actually need the rcu_read_lock()), however that is not a
    67           * documented feature of the radix-tree API so it is better to be
    68           * safe here (we don't have total exclusion from radix tree updates
    69           * here, only deletes).
    70           */
    71          rcu_read_lock();
    72          idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */
    73          page = radix_tree_lookup(&brd->brd_pages, idx);
    74          rcu_read_unlock();
    75  
 -> 76          BUG_ON(page && page->index != idx);
    77  
    78          return page;
    79  }

behlendorf · 2013-05-09T00:01:55Z

This issue was caused by a long standing bug in the kernel's brd driver. ZFS just happens to trigger it fairly reliably due to the large number of concurrent access to the device from the z_wr_* threads. I'll submit the fix upstream, in the meanwhile the following fix closes the race and resolves the problem.

http://bpaste.net/show/97497/

behlendorf · 2013-05-13T21:01:45Z

Posted upstream to LKML.

https://lkml.org/lkml/2013/5/10/424

cwedgwood · 2014-05-27T00:50:59Z

Fixed by @behlendorf upstream.

behlendorf · 2014-05-30T23:36:47Z

@cwedgwood Thanks for closing these out, I'd completely forgotten about some of these things which were fixed.

The previous patch openzfs#14841 appeared to have significant flaw, causing deadlocks if zl_get_data callback got blocked waiting for TXG sync. I already handled some of such cases in the original patch, but issue openzfs#14982 shown cases that were impossible to solve in that design. This patch fixes the problem by postponing log blocks allocation till the very end, just before the zios issue, leaving nothing blocking after that point to cause deadlocks. Before that point though any sleeps are now allowed, not causing sync thread blockage. This require slightly more complicated lwb state machine to allocate blocks and issue zios in proper order. But with removal of special early issue workarounds the new code is much cleaner now, and should even be more efficient. Since this patch uses null zios between write, I've found that null zios do not wait for logical children ready status in zio_ready(), that makes parent write to proceed prematurely, producing incorrect log blocks. Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children() fixes it. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Co-authored-by: Alexander Motin <mav@FreeBSD.org>

cwedgwood closed this as completed May 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zpool over a ramdisk causes oops inside brd.c #1107

zpool over a ramdisk causes oops inside brd.c #1107

cwedgwood commented Nov 24, 2012

cwedgwood commented Nov 24, 2012

behlendorf commented May 9, 2013

behlendorf commented May 13, 2013

cwedgwood commented May 27, 2014

behlendorf commented May 30, 2014

zpool over a ramdisk causes oops inside brd.c #1107

zpool over a ramdisk causes oops inside brd.c #1107

Comments

cwedgwood commented Nov 24, 2012

cwedgwood commented Nov 24, 2012

behlendorf commented May 9, 2013

behlendorf commented May 13, 2013

cwedgwood commented May 27, 2014

behlendorf commented May 30, 2014