Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS send/recv with ashift 9->12 leads to data corruption #12770

Merged
merged 2 commits into from
Dec 7, 2021

Conversation

pcd1193182
Copy link
Contributor

Motivation and Context

See #12762 for details.

Description

We improve the facility of zfs send to determine if a block is compressed or not by using the information contained in the blkptr, rather than relying on an lsize vs psize difference (which is usually, but not always, correct). Also removed an erroneous assertion in the arc codepath.

How Has This Been Tested?

Replicated the setup in the initial issue and verified that dozer/test-second and tank/test contained the same contents after the fix was applied.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
@rincebrain
Copy link
Contributor

rincebrain commented Nov 16, 2021

As an experiment, I did the following with this PR applied (and I'm reasonably confident I wasn't running the wrong module...):

# truncate -s 1G /disk1 [...] /disk6
# zpool create -o ashift=9 -O compression=lz4 srcpool raidz1 /disk1 /disk2 /disk3
# zpool create -o ashift=12 -O compression=lz4 dstpool raidz2 /disk4 /disk5 /disk6
# zfs create srcpool/testme
# dd if=[700 MB uncompressed disk image I have laying around] of=/srcpool/testme/file1 bs=1M
# zfs snap srcpool/testme@snap1
# zfs send -RLec srcpool/testme@snap1 | zfs recv dstpool/testme1
cannot receive new filesystem stream: checksum mismatch
#

Attempting the send a second time to capture the send stream for you (with the same command with | tee /tmp/sendstream in the middle) panicked the VM:

[  546.787844] BUG: unable to handle page fault for address: ffffad704c5c6000
[  546.787871] #PF: supervisor read access in kernel mode
[  546.787888] #PF: error_code(0x0000) - not-present page
[  546.787905] PGD 100000067 P4D 100000067 PUD 10018f067 PMD 153ed8067 PTE 0
[  546.787929] Oops: 0000 [#1] SMP NOPTI
[  546.787942] CPU: 4 PID: 2868 Comm: zfs Kdump: loaded Tainted: P           OE     5.10.0-9-amd64 #1 Debian 5.10.70-1
[  546.787975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  546.788003] RIP: 0010:fletcher_4_avx2_native+0x38/0x80 [zcommon]
[  546.788023] Code: 16 53 48 89 f3 e8 18 ff ff ff c4 c1 7e 6f 04 24 c4 c1 7e 6f 4c 24 20 c4 c1 7e 6f 54 24 40 c4 c1 7e 6f 5c 24 60 48 39 eb 73 1e <c4> e2 7d 35 23 c5 fd d4 c4 c5 f5 d4 c8 c5 ed d4 d1 c5 e5 d4 da 48
[  546.788078] RSP: 0018:ffffad704c3e76a0 EFLAGS: 00010006
[  546.788096] RAX: 0000000000000000 RBX: ffffad704c5c6000 RCX: 0000000000000000
[  546.788118] RDX: 00000000ffffffff RSI: ffffad704c5b9000 RDI: ffff8d91f2477000
[  546.788139] RBP: ffffad704c5d9000 R08: 0000000000000000 R09: ffff8d9042a41a20
[  546.788160] R10: 0000000000000002 R11: ffff8d904a004d10 R12: ffffad704c3e76c0
[  546.788181] R13: ffffad704c5b9000 R14: 0000000000020000 R15: ffffad704c3e7ae8
[  546.788202] FS:  00007f4fbdb7e7c0(0000) GS:ffff8d924fd00000(0000) knlGS:0000000000000000
[  546.788226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  546.788243] CR2: ffffad704c5c6000 CR3: 0000000121060000 CR4: 00000000000506e0
[  546.788267] Call Trace:
[  546.788282]  fletcher_4_native_impl+0x57/0x90 [zcommon]
[  546.788303]  ? __kernel_write+0x289/0x2d0
[  546.788318]  fletcher_4_native+0x108/0x170 [zcommon]
[  546.788338]  fletcher_4_incremental_native+0xc6/0x160 [zcommon]
[  546.788405]  dump_record+0xbb/0x200 [zfs]
[  546.788462]  dmu_dump_write+0x1a0/0x800 [zfs]
[  546.788522]  do_dump+0x9c0/0xda0 [zfs]
[  546.788537]  ? kfree+0x438/0x480
[  546.788590]  ? dmu_send_impl+0xddf/0x15c0 [zfs]
[  546.788666]  dmu_send_impl+0xdc6/0x15c0 [zfs]
[  546.788722]  ? dmu_buf_get_user+0x13/0x20 [zfs]
[  546.788783]  ? dsl_dataset_hold_obj+0xad/0xbe0 [zfs]
[  546.789278]  ? spl_kmem_alloc_impl+0xed/0x110 [spl]
[  546.789778]  ? tsd_hash_search.isra.0+0x6c/0x90 [spl]
[  546.790309]  dmu_send_obj+0x247/0x340 [zfs]
[  546.790761]  ? dbuf_read+0x462/0x6c0 [zfs]
[  546.791010]  zfs_ioc_send+0xe8/0x2c0 [zfs]
[  546.791248]  ? zfs_ioc_send+0x2c0/0x2c0 [zfs]
[  546.791486]  zfsdev_ioctl_common+0x78c/0x9c0 [zfs]
[  546.791735]  zfsdev_ioctl+0x53/0xe0 [zfs]
[  546.791945]  __x64_sys_ioctl+0x83/0xb0
[  546.792148]  do_syscall_64+0x33/0x80
[  546.792343]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  546.792544] RIP: 0033:0x7f4fbe163cc7
[  546.792756] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[  546.793324] RSP: 002b:00007ffcdc8b1c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  546.793550] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4fbe163cc7
[  546.793776] RDX: 00007ffcdc8b2070 RSI: 0000000000005a1c RDI: 0000000000000003
[  546.793983] RBP: 00007ffcdc8b5650 R08: 000055ca42ccd460 R09: 0000000000000000
[  546.794187] R10: 000055ca42cdb820 R11: 0000000000000246 R12: 000055ca42ccd460
[  546.794390] R13: 000055ca42ccfc90 R14: 00007ffcdc8b2070 R15: 000055ca42ccfca0

So that's new and exciting. (I was just looking to see how raidz played in the sandbox with p/lsize here...)

I can't reproduce it on VM reboot though... (I rebooted the VM before running this in the first place, though, so ???)

I'm going to go memtest just to be sure nothing is broken, since this system is reasonably new, but I haven't had any issues before this...

edit: It reproduces if I freshly recreate the pools and write the data anew, but not on reboot. How very odd...

@rincebrain
Copy link
Contributor

rincebrain commented Nov 16, 2021

According to zstreamdump on reproducing it...

[...]
WRITE object = 3 type = 19 checksum type = 7 compression type = 0 flags = 0 offset = 1441792 logical_size = 131072 compressed_size = 0 payload_size = 131072 props = f000300ff salt = 0000000000000000 iv = 000000000000000000000000 mac = 00000000000000000000000000000000
    checksum = 870da21859d8/8d8ed46d90eacb05/ea3b9ac0326216b7/3b63f31b640e4f18
WRITE object = 3 type = 19 checksum type = 7 compression type = 0 flags = 0 offset = 1572864 logical_size = 131072 compressed_size = 0 payload_size = 131072 props = f000300ff salt = 0000000000000000 iv = 000000000000000000000000 mac = 00000000000000000000000000000000
    checksum = a9037b89ce8b/dadcc0a2af816c42/77f11e5abb4924bc/5e405a5f6618defd
WRITE object = 3 type = 19 checksum type = 7 compression type = 0 flags = 0 offset = 1703936 logical_size = 131072 compressed_size = 0 payload_size = 131072 props = f000300ff salt = 0000000000000000 iv = 000000000000000000000000 mac = 00000000000000000000000000000000
    checksum = cb2167efed14/390a641f372352a5/f258877bf1b66a77/fa3adcf4c1c3d269
WRITE object = 3 type = 19 checksum type = 7 compression type = 0 flags = 0 offset = 1835008 logical_size = 131072 compressed_size = 0 payload_size = 131072 props = f000300ff salt = 0000000000000000 iv = 000000000000000000000000 mac = 00000000000000000000000000000000
    checksum = ecf4e114ae19/a824dc6b5ebfecd3/a576b9ff4fdcc5f7/426f0590675b1009
WRITE object = 3 type = 19 checksum type = 7 compression type = 0 flags = 0 offset = 1966080 logical_size = 131072 compressed_size = 0 payload_size = 131072 props = f000300ff salt = 0000000000000000 iv = 000000000000000000000000 mac = 00000000000000000000000000000000
    checksum = 10f95c17c864b/289dc65e49eac6b2/6929e09a3eb5a09a/af7a3009a193dda9
invalid checksum
Incorrect checksum in record header.
Expected checksum = 131ac6af02adb/ba41d28c73d165b0/f1816a59f44dcdf4/5efd56dfe7b5a5b0

(If you'd like, the "invalid checksum" send output is here.)

(FYI, I just booted this patch on my non-VM testbed, and it panics the same way. So I don't think there's much chance of it being just my new system being flaky...)

module/zfs/dmu_send.c Outdated Show resolved Hide resolved
@rincebrain
Copy link
Contributor

rincebrain commented Nov 16, 2021

I had a long writeup of what's going wrong, but since GH won't let you comment on unchanged lines:

I think the issue is that, as-is, your code doesn't ever set srdp->io_compressed if the arc_read call succeeds, so you end up if the ARC has the data treating it as uncompressed, and Bad Times Ensue. I would suggest setting it outside of the if, at which point, I don't think you need the one inside the if any more?

Haven't tested it, yet, but it seems correct from what I observed (if it's in the ARC, you get into dmu_dump_write with io_compressed=0).

(Edit: Quickly testing with just a srdp->io_compressed = request_compressed; above the arc_read call seems to behave correctly for both the in ARC and not in ARC cases, but that's hardly exhaustive...)

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
@pcd1193182
Copy link
Contributor Author

This new commit should address the issue; I also tested the patch with sends hitting and missing in the ARC.

@mmaybee mmaybee added the Status: Accepted Ready to integrate (reviewed, tested) label Dec 7, 2021
@mmaybee mmaybee merged commit 3760273 into openzfs:master Dec 7, 2021
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 8, 2021
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12770
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 8, 2021
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12770
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Feb 6, 2022
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12770
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12770
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12770
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Sep 18, 2022
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12770
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Sep 19, 2022
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12770
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants