zfs send --dedup corrupts some files #7703

dioni21 · 2018-07-10T22:20:21Z

System information

Type	Version/Name
Distribution Name	Fedora
Distribution Version	27
Linux Kernel	4.17.3-100.fc27.x86_64
Architecture	x86_64
ZFS Version	zfs-0.7.9-1.fc27.x86_64
SPL Version	spl-0.7.9-1.fc27.x86_64

Describe the problem you're observing

While transferring data to a new pool, I found some corrupted files. These files were only found because I ran a md5sum on every file on both source and destination pools. No error was generated during send/recv procedure.

Apparently, only the final part (block?) of corrupted files are junk, content from somewhere else. All corrupted files had deduped counterparts, but not all were affected. Indeed, I found an example file which had 4 copies, but only one was corrupted.

Maybe related to #2210 and #3066, but not sure...

Describe how to reproduce the problem

This operation generates a bad dataset:

zfs send --dedup --props --large-block --replicate --embed --compressed oldpool/dataset@mysnap | mbuffer -m 2g -s 128k | zfs recv -vs newpool/dataset

This operation generates a good dataset:

zfs send --props --large-block --replicate --embed --compressed oldpool/dataset@mysnap | mbuffer -m 2g -s 128k | zfs recv -vs newpool/dataset

Include any warning/errors/backtraces from the system logs

No error found at system logs.

The text was updated successfully, but these errors were encountered:

rincebrain · 2018-07-10T23:03:02Z

Oh boy.

I don't suppose this is on a system where you can try running older ZoL versions and do a git bisect to figure out if there was a point in the past where this didn't (metaphorically) catch fire? (I wouldn't suggest doing anything else with the pools while doing such testing, and if you ended up going back to 0.6.5.X releases to test, you might end up either with a read-only import or reaching a point in the past where the pool won't import because of newer feature flags.)

Do you know if using the "good" command to send foo@mysnap | recv bar/baz and then using the "bad" command to send bar/baz@mysnap | recv bar/try2 would also produce a bad stream? (That is, I'm curious whether the problem is still reproducible after a "good" send/recv, which might mean there's something strange about the way it's written on the old pool, or whether it reproduces itself on recv, which suggests a different kind of problem.)

What's the output of zpool get all on oldpool and newpool, and zfs get all on oldpool/dataset and wherever on newpool you're receiving it into?

I'm kind of curious if the patch in #7701 might be useful, though I don't think so at a second glance, since that looks like the problem described there probably would be a race, not 100% reproducible.

If it's always mangling the last block of a file, that's...very strange. Hm.

I don't suppose you would be able to share the respective good/bad send streams?

dioni21 · 2018-07-13T00:14:26Z

Too many questions, let me try to answer some...

First, the source pool contains my "life" in data. That's why I did a md5sum of every file after transferring. So, I do not like to play with it before it is fully transferred to the new drive, and this changed to main system drive. After that, I may try something with the old data.

I already know (from zdb -ccc) that there are some lost free blocks in the old volume. Could not finish testing in the new volume, zdb always dies with SIGSEGV, as sooner as high the number of inflight I/Os.

Also, this was just a first transfer. While doing a second, incremental, transfer, I got a SIGSEGV right at the end of zfs recv, after every dataset has been transferred.

I don't suppose this is on a system where you can try running older ZoL versions and do a git bisect to figure out if there was a point in the past where this didn't (metaphorically) catch fire? (I wouldn't suggest doing anything else with the pools while doing such testing, and if you ended up going back to 0.6.5.X releases to test, you might end up either with a read-only import or reaching a point in the past where the pool won't import because of newer feature flags.)

Doing a full dataset xfer takes sometime, specially because I use dedup=verify. I can try setting up another machine to do this later, but cannot promise anything now.

Do you know if using the "good" command to send foo@mysnap | recv bar/baz and then using the "bad" command to send bar/baz@mysnap | recv bar/try2 would also produce a bad stream? (That is, I'm curious whether the problem is still reproducible after a "good" send/recv, which might mean there's something strange about the way it's written on the old pool, or whether it reproduces itself on recv, which suggests a different kind of problem.)

Keep this idea on. I'll try as soon as the latest md5sum run (after zfs send -I) finishes.

What's the output of zpool get all on oldpool and newpool, and zfs get all on oldpool/dataset and wherever on newpool you're receiving it into?

I could only find one unreasonable difference: unsupported@com.datto:encryption inactive

In fact, I used to run git master until recently, but it was diverging too much from stable path, so I changed to use yum repo releases. I am not sure when I did this change, though.

I'm kind of curious if the patch in #7701 might be useful, though I don't think so at a second glance, since that looks like the problem described there probably would be a race, not 100% reproducible.

I also think #7701 would not fix, since it intends to fix a crash, not a data corruption. But COULD be a race. Note that two copies of same dataset with --dedup resulted corrupted, but not in the same files...

If it's always mangling the last block of a file, that's...very strange. Hm.

Also: not always the same files, not every file with same deduped content... Typically a race or lost pointer somewhere.

I don't suppose you would be able to share the respective good/bad send streams?

All datasets contain very private data, so I prefer not to share them.

rincebrain · 2018-07-13T20:59:22Z

@dioni21 Okay, so you can't share the datasets. That's fine.

Can you share the output of zdb -vvvvv oldpool/dataset [file id] versus zdb -vvvvv newpool/dataset [corrupted file id] versus zdb -vvvvv newpool/dataset [deduped file that is not corrupted id]?

(Specifically, for the above, I'm looking for the output from a file that's mangled on the receiver side, a file that shares blocks with the first one but isn't mangled on the receiver side, and the same information for those two files on oldpool.)

What do you mean, "lost free blocks in the old volume"?

Also, it seems likely that zfs send without -D would probably save you a bunch of time, particularly if you're receiving it into a dataset with dedup+verify enabled anyway, though you obv. wouldn't have run across this problem if you weren't using that, so bit of a mixed bag. :)

dioni21 · 2018-07-17T21:55:28Z

@rincebrain As you may have noticed in another issue (#7723), I've probably been affected by a recent kernel bug generating panics during heavy I/Os. AFAIK, my destination pool has been corrupted into a state that loading zfs driver into memory is enough to panic the host. I had to wipe one of the new disks to start copying again, but I still have the other untouched for post-mortem.

Right now I am taking my (sleepless) nights trying to fix this mess.

Can you share the output of zdb -vvvvv oldpool/dataset [file id] versus zdb -vvvvv newpool/dataset [corrupted file id] versus zdb -vvvvv newpool/dataset [deduped file that is not corrupted id]?

Not right now. If I can make my system stop panicking and restart copy, I'll try to redo the bug analysis.

(Specifically, for the above, I'm looking for the output from a file that's mangled on the receiver side, a file that shares blocks with the first one but isn't mangled on the receiver side, and the same information for those two files on oldpool.)

I did not understand what you want. If they were text files, I would send a diff. But how can I compare two zfs send streams? Specially considering that the contents would be very different: deduped data will be sent many times without --dedup, if I understood correctly.

What do you mean, "lost free blocks in the old volume"?

That's the output of zdb -ccc. I'm not sure if this is the exact message, but I think it is reported from the leak tracing and space maps. From zdb (8) manual:

     -L      Disable leak tracing and the loading of space maps.  By default, zdb verifies that all non-free blocks are referenced,
             which can be very expensive.

Also, it seems likely that zfs send without -D would probably save you a bunch of time, particularly if you're receiving it into a dataset with dedup+verify enabled anyway, though you obv. wouldn't have run across this problem if you weren't using that, so bit of a mixed bag. :)

As an end user, my expectation was that the receiving side should not redo dedup/compress/etc verification if the source already sends the processed data.

Indeed, maybe I should just have added the new disks to the pool, and do the auto-expand procedure, but I wanted to try in practice a big zfs send operation.

rincebrain · 2018-07-17T22:10:22Z

@dioni21 What I wanted was the text output of zdb -vvvv [dataset] [object id] for two files, one that had deduplicated blocks and was mangled on the new pool, and one that was not, on each of the old and new pool.

So if you have oldpool/dataset@snap1, and did zfs send -D [other flags] oldpool/dataset@snap1 | zfs recv newpool/dataset, and had two files, IntactFile which had deduplicated blocks but did not get mangled on newpool, and MangledFile which had deduplicated blocks but did get mangled in the send, I want:

zdb -vvvv oldpool/dataset@snap1 [IntactFile's object id on oldpool]
zdb -vvvv oldpool/dataset@snap1 [MangledFile's object id on oldpool]
zdb -vvvv newpool/dataset@snap1 [IntactFile's object id on newpool]
zdb -vvvv newpool/dataset@snap1 [MangledFile's object id on newpool]

(It would be even more convenient if IntactFile and MangledFile both shared the same blocks on oldpool, but that is not a requirement.)

I believe zfs recv can't generally reuse the dedup table info from a send stream - for example, if you're using one of the salted checksums (or even just a different checksum type), the checksums would vary between src and dst but still refer to the same data block, to say nothing of all the blocks that are going to vary above the actual "data" blocks. I think it'll be faster when bandwidth is the bottleneck with zfs send -D because it'll only be sending one copy of the relevant data block, and I believe send -D will go faster if the source is using e.g. sha256 because it can then just use the existing calculated checksum rather than redoing it, but other than that, no.

(It's worth noting that you can zfs send -D and recv into a pool with dedup=off without ending up with a dedup table, which is useful for some people when bandwidth is much more limited than CPU time.)

ahrens · 2020-04-28T04:42:46Z

The send --dedup flag has been removed by #10212. New releases (2.0+) will ignore the --dedup flag to zfs send, so you won't experience this problem. It's unfortunate that we didn't get to a root cause of this :(

dioni21 mentioned this issue Jul 10, 2018

ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees() #7672

Closed

behlendorf added this to the 0.8.0 milestone Jul 12, 2018

dioni21 mentioned this issue Jul 21, 2018

zfs incremental recv fails with 'destination already exists' and corrupts pool #7735

Closed

implr mentioned this issue Feb 16, 2019

send -Dc causes corruption when received #8421

Closed

behlendorf added Type: Defect Incorrect behavior (e.g. crash, hang) Component: Send/Recv "zfs send/recv" feature labels Mar 20, 2019

behlendorf removed this from the 0.8.0 milestone Mar 20, 2019

ahrens closed this as completed Apr 28, 2020

bghira mentioned this issue Aug 11, 2020

silent corruption gives input/output error but cannot be detected with scrub, experienced on 0.7.5 and 0.8.3 versions #10697

Closed

makhomed mentioned this issue Jan 18, 2021

Describe in the OpenZFS documentation all known data corruption bugs and all other critical bugs and affected OpenZFS versions #11481

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs send --dedup corrupts some files #7703

zfs send --dedup corrupts some files #7703

dioni21 commented Jul 10, 2018

rincebrain commented Jul 10, 2018

dioni21 commented Jul 13, 2018

rincebrain commented Jul 13, 2018 •

edited

Loading

dioni21 commented Jul 17, 2018

rincebrain commented Jul 17, 2018

ahrens commented Apr 28, 2020

zfs send --dedup corrupts some files #7703

zfs send --dedup corrupts some files #7703

Comments

dioni21 commented Jul 10, 2018

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

rincebrain commented Jul 10, 2018

dioni21 commented Jul 13, 2018

rincebrain commented Jul 13, 2018 • edited Loading

dioni21 commented Jul 17, 2018

rincebrain commented Jul 17, 2018

ahrens commented Apr 28, 2020

rincebrain commented Jul 13, 2018 •

edited

Loading