Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BRT: Fix holes cloning #16007

Closed
wants to merge 3 commits into from
Closed

BRT: Fix holes cloning #16007

wants to merge 3 commits into from

Conversation

amotin
Copy link
Member

@amotin amotin commented Mar 18, 2024

  • When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes.
  • When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. Use BP_SET_BIRTH() when possible to not replicate its logic.

Closes: #15994

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@rrevans
Copy link
Contributor

rrevans commented Mar 19, 2024

The fix looks good, though probably want a new test in the CI suite for both cases?

Would also suggest adding assertions for the existing hole pointers having zero physical birth time during dmu_read_l0_bps if that's not already covered elsewhere.

@amotin amotin force-pushed the clone_hole branch 3 times, most recently from e634246 to ed54b55 Compare March 19, 2024 19:55
@amotin
Copy link
Member Author

amotin commented Mar 19, 2024

Would also suggest adding assertions for the existing hole pointers having zero physical birth time during dmu_read_l0_bps if that's not already covered elsewhere.

It don't think it is important in general to assert it. And doing it in dmu_read_l0_bps is even less useful, since it does not care, and wherever it could happen before, it already happened and we won't know where.

@amotin amotin force-pushed the clone_hole branch 3 times, most recently from fd15430 to a424355 Compare March 20, 2024 17:09
@amotin
Copy link
Member Author

amotin commented Mar 20, 2024

I've found that we do need to mess with physical birth time here, since previous block pointer could have zero there, thanks to BP_SET_BIRTH() logic. I've just rewritten it more explicitly using BP_SET_BIRTH() instead of incomplete local hack. Now I hope next set of tests to come clean aside of the infrastructure problems CI has.

@amotin amotin force-pushed the clone_hole branch 3 times, most recently from 5cf3960 to 37cc913 Compare March 21, 2024 12:39
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The cp_files_002_pos is known to be a bit racy, we need to add a retry there.

@implr
Copy link

implr commented Mar 22, 2024

There's still a lockup here for absurdly large sparse files. I found it with xfstests' generic/303, but the simplest reproducer I have is:

xfs_io -f -c "pwrite -S 0x61 9223372036854775806 1" file1
cp --reflink=always -p -f file1 file2

and the cp will never finish. perf indicates it spins in dbuf_find/dbuf_destroy/mutex_lock.

@amotin
Copy link
Member Author

amotin commented Mar 22, 2024

There's still a lockup here for absurdly large sparse files.

@implr I don't see how it is related to this PR. It can be considered a form of #16014 , or could be counted as separate issue, since we could think of adding signal handling to large zfs_clone_range() calls to be able to abort it.

@implr
Copy link

implr commented Mar 23, 2024

Oops, right, I didn't see that issue. Nevermind.

@behlendorf
Copy link
Contributor

@amotin if you can rebase this and resolve the conflict I'll go ahead and merge this.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 25, 2024
 - When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: openzfs#15994
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
behlendorf pushed a commit that referenced this pull request Mar 27, 2024
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
behlendorf pushed a commit that referenced this pull request Mar 27, 2024
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
@amotin amotin deleted the clone_hole branch March 28, 2024 00:48
amotin added a commit to amotin/zfs that referenced this pull request Apr 17, 2024
 - When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#15994
Closes openzfs#16007
amotin added a commit to amotin/zfs that referenced this pull request Apr 17, 2024
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16007
amotin added a commit to amotin/zfs that referenced this pull request Apr 17, 2024
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16007
behlendorf pushed a commit that referenced this pull request Apr 19, 2024
 - When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15994
Closes #16007
behlendorf pushed a commit that referenced this pull request Apr 19, 2024
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
behlendorf pushed a commit that referenced this pull request Apr 19, 2024
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
 - When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#15994
Closes openzfs#16007
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16007
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zfs_bclone_wait_dirty=1 broken for files with unallocated blocks at the end
4 participants