Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement corruption correcting recv #9372

Merged
merged 1 commit into from
Jul 28, 2022
Merged

Conversation

alek-p
Copy link
Contributor

@alek-p alek-p commented Sep 27, 2019

This patch implements a new type of zfs receive: corrective receive (-c). This type of recv is used to heal corrupted data when a replica of the data already exists (in the form of a sendfile for example).
Metadata can not be healed using a corrective receive.

This patch enables us to receive a send stream into an existing snapshot for the purpose of correcting data corruption.

This is the updated version of the patch in #9323

Motivation and Context

In the past in the rare cases where ZFS has experienced permanent data corruption, full recovery of the dataset(s) has not always been possible even if replicas existed.
This patch makes recovery from permanent data corruption possible.

Description

For every write record in the send stream, we read the corresponding block from disk and if that read fails with a checksum error we overwrite that block with data from the send stream.
After the data is healed we reread the block to make sure it's healed and remove the healed blocks form the corruption lists seen in zpool status.

To makes sure will have correctly matched the data in the send stream to the right dataset to heal there is a restriction that the GUID for the snapshot being received into must match the GUID in the send stream. There are likely several snapshots referring to the same potentially corrupted data so there may be many snapshots with the above condition holding that are able to heal a single block.

The other thing to point out is that we can only correct data. Specifically, we are only able to heal records of type DRR_WRITE.

To help with the review you can see my OpenZFS dev summit 2019 talk for more context on this work:
video: https://www.youtube.com/watch?v=JldbtDATrOo
slides: https://drive.google.com/file/d/1Ysc_3bJWmsJCETFNTRCzyvpseDpzjjf2/view

How Has This Been Tested?

I've been running unit testing very similar to the test that I've added to the zfs-tests

Future Work

Since DRR_SPILL record also (like DRR_WRITE) contains all of the data needed to recreate the damaged block - a future project could add support for healing of DRR_SPILL records.
The next logical extension for part two of this work is to provide a way for a corrupted pool to tell a backup system to generate a minimal send stream in such a way as to enable the corrupted pool to be healed with this generated send stream.
The interface could be something like the following, but maybe there are better suggestions?

# dumps spa err list that are part of this snapshot and the snapshot guid
zfs send -C data/fs@snap > /tmp/errlist 

# on replica system generates healing sendfile based on the errors list
zfs send -cc /tmp/errlist backup_data > /tmp/healing_sendfile

# heal our data with the minimal healing sendfile
zfs recv -c data/fs@snap < /tmp/healing_sendfile

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@alek-p alek-p force-pushed the healing_recv branch 4 times, most recently from d17126b to 4312e3e Compare September 28, 2019 22:26
@behlendorf behlendorf added the Status: Design Review Needed Architecture or design is under discussion label Sep 30, 2019
@ahrens ahrens mentioned this pull request Oct 1, 2019
12 tasks
@alek-p alek-p added the Status: Work in Progress Not yet ready for general review label Oct 2, 2019
@megari
Copy link
Contributor

megari commented Oct 7, 2019

This feature is indeed really nice to have. However, I am curious about whether it would - even theoretically - be possible to heal metadata using a corrective receive. Support for that would definitely be a killer feature.

@alek-p
Copy link
Contributor Author

alek-p commented Oct 7, 2019

This feature is indeed really nice to have. However, I am curious about whether it would - even theoretically - be possible to heal metadata using a corrective receive. Support for that would definitely be a killer feature.

I agree that it would be great to be able to heal metadata but as far as I know, there isn't enough information in the send file to do that.
We are only able to heal records of type DRR_WRITE and DRR_SPILL since those are the only ones (again afaik) that contain all of the data needed to recreate damaged blocks.

@alek-p alek-p force-pushed the healing_recv branch 2 times, most recently from 60e6d08 to 07a01dc Compare October 22, 2019 06:21
@alek-p alek-p force-pushed the healing_recv branch 3 times, most recently from 9cb189c to 94deaf5 Compare October 31, 2019 04:59
@alek-p alek-p added the Component: Send/Recv "zfs send/recv" feature label Nov 2, 2019
@alek-p alek-p removed the Status: Work in Progress Not yet ready for general review label Nov 5, 2019
@alek-p
Copy link
Contributor Author

alek-p commented Nov 5, 2019

I've fixed the re-encryption code so this is ready for review now.

@codecov
Copy link

codecov bot commented Nov 7, 2019

Codecov Report

Merging #9372 into master will decrease coverage by <1%.
The diff coverage is 70%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #9372    +/-   ##
========================================
- Coverage      80%      79%   -<1%     
========================================
  Files         384      384            
  Lines      121788   122069   +281     
========================================
- Hits        96900    96897     -3     
- Misses      24888    25172   +284
Flag Coverage Δ
#kernel 80% <72%> (ø) ⬇️
#user 67% <15%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a340316...d3ec54e. Read the comment docs.

andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
@pepsinio
Copy link

Would this fix make it to release soon?

@GregorKopka
Copy link
Contributor

Would this fix make it to release soon?

@behlendorf ?

@behlendorf
Copy link
Contributor

This feature will make it in to the OpenZFS 2.2 release.

@pepsinio
Copy link

This is a huge improvement. The last missing piece for my use case with offsite backup, no RaidZ and having low throughput link which makes it difficult to recreate datasets with ease in case of errors. Quite certain i am not the only one on this boat.

@FlorianHeigl
Copy link

@pepsinio you're not the only one on that boat. I follow this topic since before the PR was opened. no WAN use case but the generally understanding that this is a major resilience feature that stabilizes many related bits and bytes.

behlendorf added a commit that referenced this pull request Jun 30, 2023
New features:
- Fully adaptive ARC eviction (#14359)
- Block cloning (#13392)
- Scrub error log (#12812, #12355)
- Linux container support (#14070, #14097, #12263)
- BLAKE3 Checksums (#12918)
- Corrective "zfs receive" (#9372)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
New features:
- Fully adaptive ARC eviction (openzfs#14359)
- Block cloning (openzfs#13392)
- Scrub error log (openzfs#12812, openzfs#12355)
- Linux container support (openzfs#14070, openzfs#14097, openzfs#12263)
- BLAKE3 Checksums (openzfs#12918)
- Corrective "zfs receive" (openzfs#9372)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Status: Accepted Ready to integrate (reviewed, tested) Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.