-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix range locking in ZIL commit codepath #6477
Conversation
a4e4fc8
to
a360784
Compare
UPDATE: it seems this issue has been introduced (or it's more likely to be triggered) with OpenZFS 7578 - Fix/improve some aspects of ZIL writing (1b7c1e5). When we have a ZVOL with Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in Even though it was initially thought this was somehow related to some combination of virtualization settings and storage drivers this is actually reproducible on a file-based vdev pool with
Without
We could fix this by taking a larger range lock in |
f83f635
to
eeee4e2
Compare
It seems this creates a circular dependency in the taskq dispatching logic:
Unfortunately the second
Loading SPL with |
@loli10K thanks for digging in to this!
Right, this is what I'd suggest and in fact what happens in the counterpart ZPL function |
@behlendorf i don't have the infrastructure in place to build and test FreeBSD but i can confirm this affects OpenZFS too, i've just reproduced this on current Illumos:
I also tried to "fix" the second issue (circular dependency in the taskq dispatching logic) with a second, separate commit introducing a new utility function I will squash and rebase on master as soon as the buildbot is done. |
I tried to reproduce this on FreeBSD with recipe provided above, but failed for some reason. I am not sure I am qualified enough in DMU layer to review this, but I tend to suppose that somebody who first implemented this rounding for ZFS case had something in his mind, and I don't see why ZVOL should differ. |
module/zfs/zvol.c
Outdated
*/ | ||
if (zfs_range_locked(&zv->zv_range_lock, P2ALIGN(offset, | ||
zv->zv_volblocksize), zv->zv_volblocksize) > 1) | ||
need_sync = B_TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works but I worry about the impact on performance. You should be able to resolve this circular dependency similarly by making the sync check outside zvol_write()
. This way the zil_commit()
never gets run in the taskq thread where it might block consuming the thread. Incidentally, this is why the zfs_range_lock()
is always acquired in zvol_request()
.
need_sync = bio_is_fua(bio) || zv->zv_objset->os_sync == ZFS_SYNC_ALWAYS;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf sorry this is not completely clear to me: if zil_commit()
is called directly by zvol_write()
and we need to avoid running from a taskq thread i think we need to do it here in zvol_request()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly right. Which is what will happen if we replace your zfs_range_locked()
check with the same check sync
check from zvol_write()
. All of zvol_write()
will run synchronously in zvol_request()
like your existing patch.
I'll take a look at this later in the week. @prakashsurya may also be interested. |
module/zfs/zvol.c
Outdated
if (start >= end) | ||
goto out; | ||
if (start >= end) { | ||
error = SET_ERROR(EIO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding an additional SET_ERROR(EIO)
here: also i don't understand why we need to check if start >= end
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC correctly there are cases when the kernel can submit zero length discards and set FUA. In which case the function should only call zil_commit()
(which was obviously missing). So this should be updated to leave error = 0
. @tuxoko may better remember the exact details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what this check is for, we already check for size == 0
is zvol_request, so I think it is safe to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this check was added by commit 089fa91 as a optimization to skip expensive non-aligned discards. We'll want to keep this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes you're right, start and end are rounded to block size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from the zvol_discard() comment.
module/zfs/zvol.c
Outdated
if (start >= end) | ||
goto out; | ||
if (start >= end) { | ||
error = SET_ERROR(EIO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC correctly there are cases when the kernel can submit zero length discards and set FUA. In which case the function should only call zil_commit()
(which was obviously missing). So this should be updated to leave error = 0
. @tuxoko may better remember the exact details here.
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good. @ahrens can you look this over, OpenZFS will want to pick up the zvol_get_data()
range locking fix.
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6238 Closes #6315 Closes #6356 Closes #6477
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#6238 Closes openzfs#6315 Closes openzfs#6356 Closes openzfs#6477
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#6238 Closes openzfs#6315 Closes openzfs#6356 Closes openzfs#6477
Description
I'm trying to reproduce the issue described in #6238 reliably so we can add a new test case when it's fixed.
This PR will be updated with a proper fix when the code is ready.
Motivation and Context
Investigate #6238
Also #6315 and #6356 seems to be duplicates of the same issue.
How Has This Been Tested?
Local QEMU/KVM hypervisor with ZVOLs used directly (VirtIO) and exported via ISCSI. This need more investigation.
Types of changes
Checklist:
Signed-off-by
.