Skip to content

Commit

Permalink
vfs: avoid problematic remapping requests into partial EOF block
Browse files Browse the repository at this point in the history
A deduplication data corruption is exposed in XFS and btrfs. It is
caused by extending the block match range to include the partial EOF
block, but then allowing unknown data beyond EOF to be considered a
"match" to data in the destination file because the comparison is only
made to the end of the source file. This corrupts the destination file
when the source extent is shared with it.

The VFS remapping prep functions  only support whole block dedupe, but
we still need to appear to support whole file dedupe correctly.  Hence
if the dedupe request includes the last block of the souce file, don't
include it in the actual dedupe operation. If the rest of the range
dedupes successfully, then reject the entire request.  A subsequent
patch will enable us to shorten dedupe requests correctly.

When reflinking sub-file ranges, a data corruption can occur when the
source file range includes a partial EOF block. This shares the unknown
data beyond EOF into the second file at a position inside EOF, exposing
stale data in the second file.

If the reflink request includes the last block of the souce file, only
proceed with the reflink operation if it lands at or past the
destination file's current EOF. If it lands within the destination file
EOF, reject the entire request with -EINVAL and make the caller go the
hard way.  A subsequent patch will enable us to shorten reflink requests
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
  • Loading branch information
djwong authored and dchinner committed Oct 29, 2018
1 parent 9fd91a9 commit 07d19dc
Showing 1 changed file with 33 additions and 0 deletions.
33 changes: 33 additions & 0 deletions fs/read_write.c
Original file line number Diff line number Diff line change
Expand Up @@ -1708,6 +1708,34 @@ static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)

return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
}
/*
* Ensure that we don't remap a partial EOF block in the middle of something
* else. Assume that the offsets have already been checked for block
* alignment.
*
* For deduplication we always scale down to the previous block because we
* can't meaningfully compare post-EOF contents.
*
* For clone we only link a partial EOF block above the destination file's EOF.
*/
static int generic_remap_check_len(struct inode *inode_in,
struct inode *inode_out,
loff_t pos_out,
u64 *len,
bool is_dedupe)
{
u64 blkmask = i_blocksize(inode_in) - 1;

if ((*len & blkmask) == 0)
return 0;

if (is_dedupe)
*len &= ~blkmask;
else if (pos_out + *len < i_size_read(inode_out))
return -EINVAL;

return 0;
}

/*
* Check that the two inodes are eligible for cloning, the ranges make
Expand Down Expand Up @@ -1787,6 +1815,11 @@ int vfs_clone_file_prep(struct file *file_in, loff_t pos_in,
return -EBADE;
}

ret = generic_remap_check_len(inode_in, inode_out, pos_out, len,
is_dedupe);
if (ret)
return ret;

return 1;
}
EXPORT_SYMBOL(vfs_clone_file_prep);
Expand Down

0 comments on commit 07d19dc

Please sign in to comment.