Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm_offset_get_from_fd merges logically separate extents #530

Closed
cebtenzzre opened this issue Sep 16, 2021 · 0 comments · May be fixed by #531
Closed

rm_offset_get_from_fd merges logically separate extents #530

cebtenzzre opened this issue Sep 16, 2021 · 0 comments · May be fixed by #531
Labels

Comments

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 16, 2021

Description

When extents are physically adjacent, rmlint assumes they can safely be merged, even if they are not logically adjacent.

Steps to reproduce

Build rmlint from the develop branch with _RM_OFFSET_DEBUG=1 and fixes for #527, #528, and #529 applied.
Then run these commands:

$ dd if=/dev/urandom of=foo_src bs=200K oflag=sync count=1
$ xfs_io -f -c 'copy_range -l 100K foo_src' -c 'fsync' foo
$ xfs_io -f -c 'copy_range -l 100K -s 100K -d 200K foo_src' -c 'fsync' foo
$ cp --reflink foo bar
$ filefrag -vb1 foo bar
Filesystem type is: 9123683e
File size of foo is 307200 (307200 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  102399: 734768922624..734769025023: 102400:             shared
   1:   204800..  307199: 734769025024..734769127423: 102400:             last,shared,eof
foo: 1 extent found
File size of bar is 307200 (307200 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  102399: 734768922624..734769025023: 102400:             shared
   1:   204800..  307199: 734769025024..734769127423: 102400:             last,shared,eof
bar: 1 extent found
$ rmlint --is-reflink -vv foo bar
DEBUG: Testing if foo is clone of bar
DEBUG: Checking link type for foo vs bar
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: Offsets match at fd1=3, fd2=4, logical=0, physical=734768922624
DEBUG: Files are clones (share same data)
Link type for 'foo' and 'bar', result:
Reflink

Actual result

rmlint only checks one pair of physical offsets, which is probably harmless to --is-reflink but indicates that rm_offset_get_fiemap is merging the representation of these two extents, which is confusing and incorrect (and can cause --is-reflink false positives).

Expected result

$ rmlint --is-reflink -vv foo bar
DEBUG: Testing if foo is clone of bar
DEBUG: Checking link type for foo vs bar
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: Offsets match at fd1=3, fd2=4, logical=0, physical=734768922624
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: Offsets match at fd1=3, fd2=4, logical=102400, physical=734769025024
DEBUG: Files are clones (share same data)
Link type for 'foo' and 'bar', result:
Reflink

Note that rmlint reports two matching offsets, just like how filefrag reports two extents.

Versions

rmlint version 2.10.1 built from develop commit bdb591f, with _RM_OFFSET_DEBUG=1 and fixes for #527, #528, and #529 applied.

cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Sep 16, 2021
There can be a hole between two physically adjacent extents. To avoid
breaking any code that calls this function, don't merge their
representation, since it completely hides the hole and makes the
physical length of the on-disk extent look larger than it really is.

Fixes sahib#530
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Sep 25, 2021
There can be a hole between two physically adjacent extents. To avoid
breaking any code that calls this function, don't merge their
representation, since it completely hides the hole and makes the
physical length of the on-disk extent look larger than it really is.

Fixes sahib#530
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Sep 28, 2021
There can be a hole between two physically adjacent extents. To avoid
breaking any code that calls this function, don't merge their
representation, since it completely hides the hole and makes the
physical length of the on-disk extent look larger than it really is.

Fixes sahib#530
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Aug 7, 2022
There can be a hole between two physically adjacent extents. To avoid
breaking any code that calls this function, don't merge their
representation, since it completely hides the hole and makes the
physical length of the on-disk extent look larger than it really is.

Fixes sahib#530
@cebtenzzre cebtenzzre linked a pull request Aug 9, 2022 that will close this issue
@cebtenzzre cebtenzzre removed the has-pr label Aug 9, 2022
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Sep 18, 2022
There can be a hole between two physically adjacent extents. To avoid
breaking any code that calls this function, don't merge their
representation, since it completely hides the hole and makes the
physical length of the on-disk extent look larger than it really is.

Fixes sahib#530
@cebtenzzre cebtenzzre changed the title rm_offset_get_fiemap merges logically separate extents rm_offset_get_from_fd merges logically separate extents Feb 22, 2023
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Feb 24, 2023
rmlint previously ignored holes between physically adjacent extents.
This could cause files to be detected as reflinked that weren't, if one
file had a complete extent and the other file had two pieces of it
separated by a hole.

Check for logical gaps as well as physical gaps to fix this problem.

Fixes sahib#530
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Feb 26, 2023
rmlint previously ignored holes between physically adjacent extents.
This could cause files to be detected as reflinked that weren't, if one
file had a complete extent and the other file had two pieces of it
separated by a hole.

Check for logical gaps as well as physical gaps to fix this problem.

Fixes sahib#530
intelfx pushed a commit to intelfx/rmlint that referenced this issue Mar 9, 2023
There can be a hole between two physically adjacent extents. To avoid
breaking any code that calls this function, don't merge their
representation, since it completely hides the hole and makes the
physical length of the on-disk extent look larger than it really is.

Fixes sahib#530
intelfx pushed a commit to intelfx/rmlint that referenced this issue Mar 9, 2023
There can be a hole between two physically adjacent extents. To avoid
breaking any code that calls this function, don't merge their
representation, since it completely hides the hole and makes the
physical length of the on-disk extent look larger than it really is.

Fixes sahib#530
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant