Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs send --dedup corrupts some files #7703

Closed
dioni21 opened this issue Jul 10, 2018 · 6 comments
Closed

zfs send --dedup corrupts some files #7703

dioni21 opened this issue Jul 10, 2018 · 6 comments
Labels
Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@dioni21
Copy link
Contributor

dioni21 commented Jul 10, 2018

System information

Type Version/Name
Distribution Name Fedora
Distribution Version 27
Linux Kernel 4.17.3-100.fc27.x86_64
Architecture x86_64
ZFS Version zfs-0.7.9-1.fc27.x86_64
SPL Version spl-0.7.9-1.fc27.x86_64

Describe the problem you're observing

While transferring data to a new pool, I found some corrupted files. These files were only found because I ran a md5sum on every file on both source and destination pools. No error was generated during send/recv procedure.

Apparently, only the final part (block?) of corrupted files are junk, content from somewhere else. All corrupted files had deduped counterparts, but not all were affected. Indeed, I found an example file which had 4 copies, but only one was corrupted.

Maybe related to #2210 and #3066, but not sure...

Describe how to reproduce the problem

This operation generates a bad dataset:

zfs send --dedup --props --large-block --replicate --embed --compressed oldpool/dataset@mysnap | mbuffer -m 2g -s 128k | zfs recv -vs newpool/dataset

This operation generates a good dataset:

zfs send --props --large-block --replicate --embed --compressed oldpool/dataset@mysnap | mbuffer -m 2g -s 128k | zfs recv -vs newpool/dataset

Include any warning/errors/backtraces from the system logs

No error found at system logs.

@rincebrain
Copy link
Contributor

Oh boy.

I don't suppose this is on a system where you can try running older ZoL versions and do a git bisect to figure out if there was a point in the past where this didn't (metaphorically) catch fire? (I wouldn't suggest doing anything else with the pools while doing such testing, and if you ended up going back to 0.6.5.X releases to test, you might end up either with a read-only import or reaching a point in the past where the pool won't import because of newer feature flags.)

Do you know if using the "good" command to send foo@mysnap | recv bar/baz and then using the "bad" command to send bar/baz@mysnap | recv bar/try2 would also produce a bad stream? (That is, I'm curious whether the problem is still reproducible after a "good" send/recv, which might mean there's something strange about the way it's written on the old pool, or whether it reproduces itself on recv, which suggests a different kind of problem.)

What's the output of zpool get all on oldpool and newpool, and zfs get all on oldpool/dataset and wherever on newpool you're receiving it into?

I'm kind of curious if the patch in #7701 might be useful, though I don't think so at a second glance, since that looks like the problem described there probably would be a race, not 100% reproducible.

If it's always mangling the last block of a file, that's...very strange. Hm.

I don't suppose you would be able to share the respective good/bad send streams?

@behlendorf behlendorf added this to the 0.8.0 milestone Jul 12, 2018
@dioni21
Copy link
Contributor Author

dioni21 commented Jul 13, 2018

Too many questions, let me try to answer some...

First, the source pool contains my "life" in data. That's why I did a md5sum of every file after transferring. So, I do not like to play with it before it is fully transferred to the new drive, and this changed to main system drive. After that, I may try something with the old data.

I already know (from zdb -ccc) that there are some lost free blocks in the old volume. Could not finish testing in the new volume, zdb always dies with SIGSEGV, as sooner as high the number of inflight I/Os.

Also, this was just a first transfer. While doing a second, incremental, transfer, I got a SIGSEGV right at the end of zfs recv, after every dataset has been transferred.

I don't suppose this is on a system where you can try running older ZoL versions and do a git bisect to figure out if there was a point in the past where this didn't (metaphorically) catch fire? (I wouldn't suggest doing anything else with the pools while doing such testing, and if you ended up going back to 0.6.5.X releases to test, you might end up either with a read-only import or reaching a point in the past where the pool won't import because of newer feature flags.)

Doing a full dataset xfer takes sometime, specially because I use dedup=verify. I can try setting up another machine to do this later, but cannot promise anything now.

Do you know if using the "good" command to send foo@mysnap | recv bar/baz and then using the "bad" command to send bar/baz@mysnap | recv bar/try2 would also produce a bad stream? (That is, I'm curious whether the problem is still reproducible after a "good" send/recv, which might mean there's something strange about the way it's written on the old pool, or whether it reproduces itself on recv, which suggests a different kind of problem.)

Keep this idea on. I'll try as soon as the latest md5sum run (after zfs send -I) finishes.

What's the output of zpool get all on oldpool and newpool, and zfs get all on oldpool/dataset and wherever on newpool you're receiving it into?

I could only find one unreasonable difference: unsupported@com.datto:encryption inactive

In fact, I used to run git master until recently, but it was diverging too much from stable path, so I changed to use yum repo releases. I am not sure when I did this change, though.

I'm kind of curious if the patch in #7701 might be useful, though I don't think so at a second glance, since that looks like the problem described there probably would be a race, not 100% reproducible.

I also think #7701 would not fix, since it intends to fix a crash, not a data corruption. But COULD be a race. Note that two copies of same dataset with --dedup resulted corrupted, but not in the same files...

If it's always mangling the last block of a file, that's...very strange. Hm.

Also: not always the same files, not every file with same deduped content... Typically a race or lost pointer somewhere.

I don't suppose you would be able to share the respective good/bad send streams?

All datasets contain very private data, so I prefer not to share them.

@rincebrain
Copy link
Contributor

rincebrain commented Jul 13, 2018

@dioni21 Okay, so you can't share the datasets. That's fine.

Can you share the output of zdb -vvvvv oldpool/dataset [file id] versus zdb -vvvvv newpool/dataset [corrupted file id] versus zdb -vvvvv newpool/dataset [deduped file that is not corrupted id]?

(Specifically, for the above, I'm looking for the output from a file that's mangled on the receiver side, a file that shares blocks with the first one but isn't mangled on the receiver side, and the same information for those two files on oldpool.)

What do you mean, "lost free blocks in the old volume"?

Also, it seems likely that zfs send without -D would probably save you a bunch of time, particularly if you're receiving it into a dataset with dedup+verify enabled anyway, though you obv. wouldn't have run across this problem if you weren't using that, so bit of a mixed bag. :)

@dioni21
Copy link
Contributor Author

dioni21 commented Jul 17, 2018

@rincebrain As you may have noticed in another issue (#7723), I've probably been affected by a recent kernel bug generating panics during heavy I/Os. AFAIK, my destination pool has been corrupted into a state that loading zfs driver into memory is enough to panic the host. I had to wipe one of the new disks to start copying again, but I still have the other untouched for post-mortem.

Right now I am taking my (sleepless) nights trying to fix this mess.

Can you share the output of zdb -vvvvv oldpool/dataset [file id] versus zdb -vvvvv newpool/dataset [corrupted file id] versus zdb -vvvvv newpool/dataset [deduped file that is not corrupted id]?

Not right now. If I can make my system stop panicking and restart copy, I'll try to redo the bug analysis.

(Specifically, for the above, I'm looking for the output from a file that's mangled on the receiver side, a file that shares blocks with the first one but isn't mangled on the receiver side, and the same information for those two files on oldpool.)

I did not understand what you want. If they were text files, I would send a diff. But how can I compare two zfs send streams? Specially considering that the contents would be very different: deduped data will be sent many times without --dedup, if I understood correctly.

What do you mean, "lost free blocks in the old volume"?

That's the output of zdb -ccc. I'm not sure if this is the exact message, but I think it is reported from the leak tracing and space maps. From zdb (8) manual:

     -L      Disable leak tracing and the loading of space maps.  By default, zdb verifies that all non-free blocks are referenced,
             which can be very expensive.

Also, it seems likely that zfs send without -D would probably save you a bunch of time, particularly if you're receiving it into a dataset with dedup+verify enabled anyway, though you obv. wouldn't have run across this problem if you weren't using that, so bit of a mixed bag. :)

As an end user, my expectation was that the receiving side should not redo dedup/compress/etc verification if the source already sends the processed data.

Indeed, maybe I should just have added the new disks to the pool, and do the auto-expand procedure, but I wanted to try in practice a big zfs send operation.

@rincebrain
Copy link
Contributor

@dioni21 What I wanted was the text output of zdb -vvvv [dataset] [object id] for two files, one that had deduplicated blocks and was mangled on the new pool, and one that was not, on each of the old and new pool.

So if you have oldpool/dataset@snap1, and did zfs send -D [other flags] oldpool/dataset@snap1 | zfs recv newpool/dataset, and had two files, IntactFile which had deduplicated blocks but did not get mangled on newpool, and MangledFile which had deduplicated blocks but did get mangled in the send, I want:

  • zdb -vvvv oldpool/dataset@snap1 [IntactFile's object id on oldpool]
  • zdb -vvvv oldpool/dataset@snap1 [MangledFile's object id on oldpool]
  • zdb -vvvv newpool/dataset@snap1 [IntactFile's object id on newpool]
  • zdb -vvvv newpool/dataset@snap1 [MangledFile's object id on newpool]

(It would be even more convenient if IntactFile and MangledFile both shared the same blocks on oldpool, but that is not a requirement.)

I believe zfs recv can't generally reuse the dedup table info from a send stream - for example, if you're using one of the salted checksums (or even just a different checksum type), the checksums would vary between src and dst but still refer to the same data block, to say nothing of all the blocks that are going to vary above the actual "data" blocks. I think it'll be faster when bandwidth is the bottleneck with zfs send -D because it'll only be sending one copy of the relevant data block, and I believe send -D will go faster if the source is using e.g. sha256 because it can then just use the existing calculated checksum rather than redoing it, but other than that, no.

(It's worth noting that you can zfs send -D and recv into a pool with dedup=off without ending up with a dedup table, which is useful for some people when bandwidth is much more limited than CPU time.)

@ahrens
Copy link
Member

ahrens commented Apr 28, 2020

The send --dedup flag has been removed by #10212. New releases (2.0+) will ignore the --dedup flag to zfs send, so you won't experience this problem. It's unfortunate that we didn't get to a root cause of this :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants