Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix range locking in ZIL commit codepath #6477

Merged
merged 1 commit into from
Aug 21, 2017

Conversation

loli10K
Copy link
Contributor

@loli10K loli10K commented Aug 8, 2017

Description

I'm trying to reproduce the issue described in #6238 reliably so we can add a new test case when it's fixed.

This PR will be updated with a proper fix when the code is ready.

Motivation and Context

Investigate #6238
Also #6315 and #6356 seems to be duplicates of the same issue.

How Has This Been Tested?

Local QEMU/KVM hypervisor with ZVOLs used directly (VirtIO) and exported via ISCSI. This need more investigation.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@loli10K loli10K added the Status: Work in Progress Not yet ready for general review label Aug 8, 2017
@loli10K loli10K force-pushed the issue-6238 branch 2 times, most recently from a4e4fc8 to a360784 Compare August 8, 2017 20:55
@loli10K
Copy link
Contributor Author

loli10K commented Aug 11, 2017

UPDATE: it seems this issue has been introduced (or it's more likely to be triggered) with OpenZFS 7578 - Fix/improve some aspects of ZIL writing (1b7c1e5).

When we have a ZVOL with logbias=throughput we force itx->itx_wr_state to WR_INDIRECT and the itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data/zvol_get_data(). The length is actually set to MIN(blocksize - P2PHASE(offset, blocksize), size) but i don't think this really changes anything in this context.

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf.

Even though it was initially thought this was somehow related to some combination of virtualization settings and storage drivers this is actually reproducible on a file-based vdev pool with fio (or even dd), we just have to push writes on the same ZVOL device is a small range (0-8K) fast enough to trigger this race condition.

root@linux:~# POOLNAME="poolname"
root@linux:~# TMPDIR='/var/tmp'
root@linux:~# #
root@linux:~# mountpoint -q $TMPDIR || mount -t tmpfs tmpfs $TMPDIR
root@linux:~# zpool destroy $POOLNAME
cannot open 'poolname': no such pool
root@linux:~# rm -f $TMPDIR/zpool.dat
root@linux:~# fallocate -l 128m $TMPDIR/zpool.dat
root@linux:~# zpool create $POOLNAME $TMPDIR/zpool.dat
root@linux:~# zfs create -V 64M -s -o sync=always -o logbias=throughput -b 128K $POOLNAME/zvol
root@linux:~# #
root@linux:~# fio --filename=/dev/zvol/$POOLNAME/zvol --sync=1 --rw=write --bs=8k --size=8k --io_limit=1m --numjobs=2 --name=issue-6238
issue-6238: (g=0): rw=write, bs=8K-8K/8K-8K/8K-8K, ioengine=sync, iodepth=1
...
fio-2.1.11
Starting 2 processes
[   89.823158] VERIFY(dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC) failed
[   89.824208] PANIC at dbuf.c:1308:dbuf_unoverride()

Message from syslogd@linux at Aug 11 22:02:55 ...
 kernel:[   89.823158] VERIFY(dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC) failed

Message from syslogd@linux at Aug 11 22:02:55 ...
 kernel:[   89.824208] PANIC at dbuf.c:1308:dbuf_unoverride()
^ZJobs: 2 (f=2): [W(2)] [7.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:39s]
[1]+  Stopped                 fio --filename=/dev/zvol/$POOLNAME/zvol --sync=1 --rw=write --bs=8k --size=8k --io_limit=1m --numjobs=2 --name=issue-6238
root@linux:~# dmesg 
[   89.823158] VERIFY(dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC) failed
[   89.824208] PANIC at dbuf.c:1308:dbuf_unoverride()
[   89.824862] Showing stack for process 1930
[   89.824865] CPU: 5 PID: 1930 Comm: zvol Tainted: G           OE   4.6.4 #1
[   89.824866]  0000000000000286 0000000081f2fdd7 ffffffff81315815 ffffffffc07236e0
[   89.824869]  ffff88002a7cbc60 ffffffffc04749e9 ffff88002b6e83c0 0000000000000028
[   89.824871]  ffff88002a7cbc70 ffff88002a7cbc10 6428594649524556 6c642e74643e2d72
[   89.824874] Call Trace:
[   89.824879]  [<ffffffff81315815>] ? dump_stack+0x5c/0x77
[   89.824888]  [<ffffffffc04749e9>] ? spl_panic+0xc9/0x110 [spl]
[   89.824892]  [<ffffffff815ca39e>] ? mutex_lock+0xe/0x30
[   89.824894]  [<ffffffff815ca39e>] ? mutex_lock+0xe/0x30
[   89.824948]  [<ffffffffc06857ce>] ? zio_ready+0x1de/0x610 [zfs]
[   89.824993]  [<ffffffffc068021e>] ? zio_wait+0x20e/0x410 [zfs]
[   89.825022]  [<ffffffffc0582a97>] ? dbuf_unoverride+0x157/0x190 [zfs]
[   89.825050]  [<ffffffffc0582c95>] ? dbuf_redirty+0x35/0xc0 [zfs]
[   89.825079]  [<ffffffffc058d266>] ? dmu_buf_will_dirty+0x96/0x1d0 [zfs]
[   89.825109]  [<ffffffffc0596564>] ? dmu_write_uio_dnode+0x74/0x1d0 [zfs]
[   89.825140]  [<ffffffffc0598778>] ? dmu_write_uio_dbuf+0x48/0x60 [zfs]
[   89.825184]  [<ffffffffc0691a70>] ? zvol_write+0x190/0x600 [zfs]
[   89.825190]  [<ffffffffc04725f3>] ? taskq_thread+0x293/0x580 [spl]
[   89.825193]  [<ffffffff810a5730>] ? wake_up_q+0x60/0x60
[   89.825198]  [<ffffffffc0472360>] ? taskq_thread_spawn+0x50/0x50 [spl]
[   89.825200]  [<ffffffff8109ac4f>] ? kthread+0xdf/0x100
[   89.825203]  [<ffffffff815cc3f2>] ? ret_from_fork+0x22/0x40
[   89.825205]  [<ffffffff8109ab70>] ? kthread_park+0x50/0x50
root@linux:~# 

Without --enable-debug:

root@linux:~# POOLNAME="poolname"
root@linux:~# TMPDIR='/var/tmp'
root@linux:~# #
root@linux:~# mountpoint -q $TMPDIR || mount -t tmpfs tmpfs $TMPDIR
root@linux:~# zpool destroy $POOLNAME
root@linux:~# rm -f $TMPDIR/zpool.dat
root@linux:~# fallocate -l 128m $TMPDIR/zpool.dat
root@linux:~# zpool create $POOLNAME $TMPDIR/zpool.dat
root@linux:~# zfs create -V 64M -s -o sync=always -o logbias=throughput -b 128K $POOLNAME/zvol
root@linux:~# #
root@linux:~# fio --filename=/dev/zvol/$POOLNAME/zvol --sync=1 --rw=write --bs=8k --size=8k --io_limit=1m --numjobs=2 --name=issue-6238
issue-6238: (g=0): rw=write, bs=8K-8K/8K-8K/8K-8K, ioengine=sync, iodepth=1
...
fio-2.1.11
Starting 2 processes
[  232.954702] PANIC: zfs: allocating allocated segment(offset=679424 size=131072)
[  232.954702] 

Message from syslogd@linux at Aug 11 22:33:40 ...
 kernel:[  232.954702] PANIC: zfs: allocating allocated segment(offset=679424 size=131072)

Message from syslogd@linux at Aug 11 22:33:40 ...
 kernel:[  232.954702] 
^Zbs: 2 (f=2)
[1]+  Stopped                 fio --filename=/dev/zvol/$POOLNAME/zvol --sync=1 --rw=write --bs=8k --size=8k --io_limit=1m --numjobs=2 --name=issue-6238
root@linux:~# dmesg
[  232.954702] PANIC: zfs: allocating allocated segment(offset=679424 size=131072)
[  232.955889] Showing stack for process 2013
[  232.955892] CPU: 7 PID: 2013 Comm: txg_sync Tainted: P           OE   4.6.4 #1
[  232.955894]  0000000000000286 0000000067d8f2c9 ffffffff81315815 0000000000000003
[  232.955896]  ffff8800ba27fb90 ffffffffc0450a94 ffffffff810aca93 6c6c61203a73667a
[  232.955897]  20676e697461636f 657461636f6c6c61 6e656d6765732064 74657366666f2874
[  232.955899] Call Trace:
[  232.955903]  [<ffffffff81315815>] ? dump_stack+0x5c/0x77
[  232.955911]  [<ffffffffc0450a94>] ? vcmn_err+0x64/0xf0 [spl]
[  232.955913]  [<ffffffff810aca93>] ? select_idle_sibling+0x23/0x100
[  232.955915]  [<ffffffff810b0833>] ? put_prev_entity+0x33/0x7c0
[  232.955916]  [<ffffffff810b2357>] ? dequeue_task_fair+0x57/0x8e0
[  232.955918]  [<ffffffff810b6a0c>] ? pick_next_task_fair+0x10c/0x4a0
[  232.955920]  [<ffffffff8102b76f>] ? __switch_to+0x26f/0x530
[  232.955954]  [<ffffffffc06ae0a9>] ? zfs_panic_recover+0x69/0x80 [zfs]
[  232.955957]  [<ffffffffc0183141>] ? avl_find+0x51/0x90 [zavl]
[  232.955981]  [<ffffffffc06978af>] ? range_tree_add+0x2af/0x2c0 [zfs]
[  232.956008]  [<ffffffffc06ffaeb>] ? zio_add_child+0xfb/0x110 [zfs]
[  232.956032]  [<ffffffffc06938bc>] ? metaslab_free_dva+0x11c/0x380 [zfs]
[  232.956055]  [<ffffffffc0696178>] ? metaslab_free+0x88/0xd0 [zfs]
[  232.956079]  [<ffffffffc069d840>] ? spa_avz_build+0x130/0x130 [zfs]
[  232.956106]  [<ffffffffc06fd508>] ? zio_dva_free+0x18/0x20 [zfs]
[  232.956133]  [<ffffffffc07017c7>] ? zio_nowait+0xa7/0x140 [zfs]
[  232.956157]  [<ffffffffc069d87b>] ? spa_free_sync_cb+0x3b/0x50 [zfs]
[  232.956174]  [<ffffffffc0653430>] ? bplist_iterate+0x90/0xe0 [zfs]
[  232.956198]  [<ffffffffc06a15d6>] ? spa_sync+0x416/0xd10 [zfs]
[  232.956200]  [<ffffffff810bbfa4>] ? __wake_up+0x34/0x50
[  232.956224]  [<ffffffffc06b1cd1>] ? txg_sync_thread+0x2e1/0x4a0 [zfs]
[  232.956258]  [<ffffffffc06b19f0>] ? txg_sync_stop+0xd0/0xd0 [zfs]
[  232.956264]  [<ffffffffc044c4ae>] ? thread_generic_wrapper+0x7e/0xc0 [spl]
[  232.956269]  [<ffffffffc044c430>] ? __thread_exit+0x20/0x20 [spl]
[  232.956271]  [<ffffffff8109ac4f>] ? kthread+0xdf/0x100
[  232.956273]  [<ffffffff815cc3f2>] ? ret_from_fork+0x22/0x40
[  232.956274]  [<ffffffff8109ab70>] ? kthread_park+0x50/0x50

We could fix this by taking a larger range lock in zvol_get_data() to protect the whole dbuf before we call dmu_sync(). Comments/suggestions are welcome since this is my first time working on ZIL code.

@loli10K loli10K changed the title [WIP] Verify we can reproduce "panic: allocating allocated segment" on the buildbot Fix range locking in ZIL commit codepath Aug 11, 2017
@sempervictus sempervictus mentioned this pull request Aug 12, 2017
12 tasks
@loli10K loli10K force-pushed the issue-6238 branch 4 times, most recently from f83f635 to eeee4e2 Compare August 13, 2017 11:27
@loli10K
Copy link
Contributor Author

loli10K commented Aug 14, 2017

It seems this creates a circular dependency in the taskq dispatching logic:

pid   | caller -> callee
-------------------------------------------------------------------------------
[8521 ] generic_make_request -> zvol_request
[8521 ] zvol_request -> zfs_range_lock (type=1, off=0, len=4096)
[8521 ] kretprobe_trampoline <- zfs_range_lock (type=1, off=0, len=4096)
[489  ] generic_make_request -> zvol_request
[489  ] zvol_request -> zfs_range_lock (type=1, off=4096, len=4096)
[489  ] kretprobe_trampoline <- zfs_range_lock (type=1, off=4096, len=4096)
[7918 ] dbuf_redirty -> dbuf_unoverride (db={.db_object=0, .db_offset=65536, .db_size=16384, .db_data=0xffff8ebed73b4000}, blkid=4, level='\000', db_objset=ERROR)
[8521 ] 0x0 <- zvol_request
[7491 ] taskq_thread -> zvol_write
[7918 ] dbuf_dirty <- dbuf_unoverride 
[7491 ] zvol_write -> uio_from_bio (bio{sector=0, size=4096})
[489  ] 0x0 <- zvol_request
[7491 ] zvol_write -> dmu_write_uio_dnode
[7491 ] kretprobe_trampoline <- dmu_write_uio_dnode
[7491 ] kretprobe_trampoline -> zvol_log_write (offset=0, size=4096, sync=1)
[7491 ] kretprobe_trampoline <- zvol_log_write
[7491 ] zvol_write -> zfs_range_unlock (type=1, off=0, len=4096)
[7491 ] kretprobe_trampoline <- zfs_range_unlock (type=1, off=0, len=4096)
[7491 ] zvol_write -> zil_commit
[7491 ] zil_commit_writer -> zvol_get_data
[7491 ] zvol_get_data -> zfs_range_lock (type=0, off=0, len=131072)
8521 : zvol_request() takes a RL_WRITER on 0-4096
489  : zvol_request() takes a RL_WRITER on 4096-4096
7491 : zvol_write() dispatched by 8521:zvol_request() on 0-4096
7491 : zvol_log_write() on 0-4096
7491 : RL_WRITER released on 0-4096
7491 : zil_commit()
7491 : zvol_get_data() is trying to take a RL_READER on 0-128K (volblocksize=128K) but has to wait for the other RL_WRITER taken on 4096-4096 by 489:zvol_request()

Unfortunately the second zvol_write on 4096-4096 is still pending:

root@linux:/usr/src/zfs# cat /proc/spl/taskq
taskq                       act  nthr  spwn  maxt   pri  mina         maxa  cura      flags
zvol/0                        1     1     0    32   100    64   2147483647    64   80000005
	active: [7491]zvol_write [zfs](0xffff8ebe9144c5e0)
	pend: zvol_write [zfs](0xffff8ebee3a34080)
(gdb) p ((zv_request_t*)0xffff8ebee3a34080)->zv->zv_name
$18 = "poolname/zvol", '\000' <repeats 242 times>
(gdb) p ((zv_request_t*)0xffff8ebee3a34080)->zv->zv_volblocksize
$19 = 131072
(gdb)  
(gdb) 
(gdb) p ((zv_request_t*)0xffff8ebee3a34080)->bio.bi_iter
$20 = {bi_sector = 8, bi_size = 4096, bi_idx = 0, bi_bvec_done = 0}
(gdb)  
(gdb) 
(gdb) p ((zv_request_t*)0xffff8ebee3a34080)->rl->r_off
$21 = 4096
(gdb) p ((zv_request_t*)0xffff8ebee3a34080)->rl->r_len
$22 = 4096
(gdb) 

Loading SPL with spl_taskq_thread_sequential=0 or spl_taskq_thread_dynamic=0 seems to "fix" the issue.

@behlendorf
Copy link
Contributor

@loli10K thanks for digging in to this!

We could fix this by taking a larger range lock in zvol_get_data() to protect the whole dbuf before we call dmu_sync().

Right, this is what I'd suggest and in fact what happens in the counterpart ZPL function zfs_get_data() already. Which is almost certainly why this is only reproducible with ZVOLs. Let's get @amotin and @ahrens thoughts on this too since it looks like OpenZFS will need the same fix.

@loli10K
Copy link
Contributor Author

loli10K commented Aug 15, 2017

@behlendorf i don't have the infrastructure in place to build and test FreeBSD but i can confirm this affects OpenZFS too, i've just reproduced this on current Illumos:

> ::obey
No Language H here
> ::status
debugging crash dump vmcore.0 (64-bit) from openindiana
operating system: 5.11 master-0-g4dd3b53234 (i86pc)
image uuid: e27b181e-27cd-c2c3-80bd-beb77d7faaaa
panic message: assertion failed: dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC, file: ../../common/fs/zfs/dbuf.c, line: 1224
dump content: all kernel and user pages
> $C
ffffff00095116f0 vpanic()
ffffff0009511720 0xfffffffffbdff898()
ffffff0009511760 dbuf_unoverride+0x11d(ffffff027f002700)
ffffff0009511790 dbuf_redirty+0x3d(ffffff027f002700)
ffffff00095117d0 dmu_buf_will_dirty+0x8d(ffffff025debb9e0, ffffff0287855600)
ffffff0009511870 dmu_write_impl+0xdf(ffffff02707c33a0, 1, 1000, 1000, fffffe00a8c9e000, ffffff0287855600)
ffffff00095118f0 dmu_write+0x98(ffffff026cdb2180, 1, 1000, 1000, fffffe00a8c9e000, ffffff0287855600)
ffffff0009511990 zvol_strategy+0x31e(ffffff025f8cc900)
ffffff00095119c0 bdev_strategy+0x61(ffffff025f8cc900)
ffffff0009511a20 spec_startio+0x8e(ffffff027dc5a400, ffffff00050e2c20, 1000, 1000, 100)
ffffff0009511ae0 spec_putapage+0xf8(ffffff027dc5a400, ffffff00050e2c20, ffffff0009511b20, ffffff0009511b28, 0, ffffff026be28698)
ffffff0009511b90 spec_putpage+0x275(ffffff027dc5a400, 0, 2000, 0, ffffff026be28698, 0)
ffffff0009511c10 fop_putpage+0x4c(ffffff027dc5a400, 0, 2000, 0, ffffff026be28698, 0)
ffffff0009511c70 vpm_sync_pages+0xb0(ffffff027dc5a400, 0, 2000, 1)
ffffff0009511d50 spec_write+0x328(ffffff0282e8f200, ffffff0009511e00, 10, ffffff026be28698, 0)
ffffff0009511dd0 fop_write+0x5b(ffffff0282e8f200, ffffff0009511e00, 10, ffffff026be28698, 0)
ffffff0009511eb0 pwrite64+0x276(3, 834b468, 2000, 0, 0)
ffffff0009511f00 _sys_sysenter_post_swapgs+0x237()
> 

I also tried to "fix" the second issue (circular dependency in the taskq dispatching logic) with a second, separate commit introducing a new utility function zfs_range_locked() (maybe i will need to change the name).

I will squash and rebase on master as soon as the buildbot is done.

@amotin
Copy link
Member

amotin commented Aug 15, 2017

I tried to reproduce this on FreeBSD with recipe provided above, but failed for some reason. I am not sure I am qualified enough in DMU layer to review this, but I tend to suppose that somebody who first implemented this rounding for ZFS case had something in his mind, and I don't see why ZVOL should differ.

module/zfs/zfs_rlock.c Outdated Show resolved Hide resolved
module/zfs/zfs_rlock.c Outdated Show resolved Hide resolved
module/zfs/zvol.c Outdated Show resolved Hide resolved
module/zfs/zvol.c Outdated Show resolved Hide resolved
*/
if (zfs_range_locked(&zv->zv_range_lock, P2ALIGN(offset,
zv->zv_volblocksize), zv->zv_volblocksize) > 1)
need_sync = B_TRUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works but I worry about the impact on performance. You should be able to resolve this circular dependency similarly by making the sync check outside zvol_write(). This way the zil_commit() never gets run in the taskq thread where it might block consuming the thread. Incidentally, this is why the zfs_range_lock() is always acquired in zvol_request().

need_sync = bio_is_fua(bio) || zv->zv_objset->os_sync == ZFS_SYNC_ALWAYS;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf sorry this is not completely clear to me: if zil_commit() is called directly by zvol_write() and we need to avoid running from a taskq thread i think we need to do it here in zvol_request()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly right. Which is what will happen if we replace your zfs_range_locked() check with the same check sync check from zvol_write(). All of zvol_write() will run synchronously in zvol_request() like your existing patch.

module/zfs/zvol.c Show resolved Hide resolved
@ahrens
Copy link
Member

ahrens commented Aug 15, 2017

I'll take a look at this later in the week. @prakashsurya may also be interested.

if (start >= end)
goto out;
if (start >= end) {
error = SET_ERROR(EIO);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding an additional SET_ERROR(EIO) here: also i don't understand why we need to check if start >= end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC correctly there are cases when the kernel can submit zero length discards and set FUA. In which case the function should only call zil_commit() (which was obviously missing). So this should be updated to leave error = 0. @tuxoko may better remember the exact details here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what this check is for, we already check for size == 0 is zvol_request, so I think it is safe to remove this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this check was added by commit 089fa91 as a optimization to skip expensive non-aligned discards. We'll want to keep this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes you're right, start and end are rounded to block size.

@loli10K loli10K removed the Status: Work in Progress Not yet ready for general review label Aug 16, 2017
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the zvol_discard() comment.

if (start >= end)
goto out;
if (start >= end) {
error = SET_ERROR(EIO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC correctly there are cases when the kernel can submit zero length discards and set FUA. In which case the function should only call zil_commit() (which was obviously missing). So this should be updated to leave error = 0. @tuxoko may better remember the exact details here.

@behlendorf behlendorf requested a review from tuxoko August 16, 2017 22:12
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good. @ahrens can you look this over, OpenZFS will want to pick up the zvol_get_data() range locking fix.

@behlendorf behlendorf merged commit f763c3d into openzfs:master Aug 21, 2017
@loli10K loli10K deleted the issue-6238 branch August 21, 2017 16:01
tonyhutter pushed a commit that referenced this pull request Aug 22, 2017
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6238 
Closes #6315 
Closes #6356 
Closes #6477
SidBB pushed a commit to catalogicsoftware/zfs that referenced this pull request Aug 31, 2017
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6238
Closes openzfs#6315
Closes openzfs#6356
Closes openzfs#6477
Fabian-Gruenbichler pushed a commit to Fabian-Gruenbichler/zfs that referenced this pull request Sep 29, 2017
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6238 
Closes openzfs#6315 
Closes openzfs#6356 
Closes openzfs#6477
@loli10K loli10K restored the issue-6238 branch November 1, 2018 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants