-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement FIDUPERANGE for Linux. #15393
base: master
Are you sure you want to change the base?
Conversation
This resolves issue openzfs#11065 by implementing FIDUPERANGE on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't checked the pipeline and didn't though much about it. But this need some changes I think.
module/os/linux/zfs/zpl_file_range.c
Outdated
if (flags & REMAP_FILE_DEDUP) { | ||
/* Zero length means to clone everything to the end of the file | ||
*/ | ||
if (len == 0) | ||
len = i_size_read(file_inode(src_file)) - src_off; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (flags & REMAP_FILE_DEDUP) { | |
/* Zero length means to clone everything to the end of the file | |
*/ | |
if (len == 0) | |
len = i_size_read(file_inode(src_file)) - src_off; | |
/* Zero length means to clone everything to the end of the file */ | |
if (len == 0) | |
len = i_size_read(file_inode(src_file)) - src_off; | |
if (flags & REMAP_FILE_DEDUP) { |
this must stay on top of the REMAP_FILE_DEDUP
condition. Because if the REMAP_FILE_DEDUP
-flag isn't set we still need to calculate the len
to the end of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I already fixed that but push the update.
Also, it definitely needs changes, as without fixing the locking issue mentioned (IE as written here) it will currently deadlock in zfs_read. For testing, i have changed it locally to have a small race condition that i never trigger instead - The race being as described above - without being able to hold the write lock over the entire compare + clone period, there is a race between compare and clone. Put another way - if you want a testable version to play around with, i'm happy to push changes that are complete but have the race condition instead. |
I have pushed a version that introduces locked versions of zfs_read and zfs_clone_range and uses them. This version is untested (working on it), and simply meant to see what the initial changes would look like. |
I have verified the current version at least trivially works, in the sense that it calls clone range on the correct ranges, and clone range says it cloned bytes, and the results have the same checksums/share blocks/etc. |
Okay, i figured out how to use zdb to show the actual blocks being used. So it's just the space usage that is not showing it through any method, and i can verify that this occurs with normal block cloning. |
*/ | ||
int | ||
zfs_clone_range_locked(znode_t *inzp, uint64_t *inoffp, znode_t *outzp, | ||
uint64_t *outoffp, uint64_t *lenp, cred_t *cr, zfs_locked_range_t *src_zlr, | ||
zfs_locked_range_t *dst_zlr) | ||
{ | ||
zfsvfs_t *inzfsvfs, *outzfsvfs; | ||
objset_t *inos, *outos; | ||
dmu_buf_impl_t *db; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like to copy zfs_clone_range to a new function. For example: encryption check is not done here, because it was added to zfs_clone_range but not to this function. To copy all of the "new" checks to this function is bad and should be avoided if possible IMO. Based on your comment, I would suggest to pass a boolen_t
to zfs_clone_range
, which can be used to check if the lock is already held. Maybe it's also possible to check if the range locks are already held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can work, but the duplicated code need to be removed if possible?
Yeah, I have no problem doing that, i just didn't want to go down the path if folks have objections. I'll clean it up |
This resolves issue #11065 by implementing FIDUPERANGE on Linux.
It is a draft because i need a little help from someone a little more familiar with ZFS code to
see if there is a better way to do the locking. (once it's all ready, i'll close this PR and create one with
the proper template).
FIDUPERANGE takes a src file, offset, length, and then an array of places to duplicate the range into.
The result status is per-destination.
For this, the correct locking (or at least one version) looks like this (IMHO, of course :P):
for each destination to duplicate a range into:
A. read lock the src range/offset
B. write lock the dst range/offset
C. compare the src and dest ranges to ensure they are the same.
D. if same, clone src into dst
E. release write lock.
F. release read lock.
Because the results are per-destination, it's actually okay if the src file changes after F but before the next iteration - we will simply report that the ranges for remaining destination are no longer the same when we compare them, and not remap.
Here are the immediate issues:
zfs_read (which i use to read and compare the files) wants to hold the read lock during C above. That is not possible since we have it write locked and they are not recursive locks. It's not obvious to me if zfs_read is the wrong thing to use here, or if i should create a zfs_read_already_locked that takes the locks, verifies they are right for the range, and proceeds without further locking.
zpl_clone_file_range has the same issue (it wants to write lock the destination).
Note: We could hold the read lock for all iterations of the loop, and avoid re-reading the src range for each destination.
I have not bothered to explore this tradeoff.