Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sending corrupt snapshots even if metadata is corrupted #12541

Merged
merged 1 commit into from
Sep 9, 2021

Conversation

allanjude
Copy link
Contributor

Signed-off-by: Allan Jude allan@klarasystems.com
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.

Motivation and Context

When a pool was damaged by having part of its disks overwritten, when it was recovered, a ZFS send would fail with EIO if a data block failed its checksum.
Setting the zfs_send_corrupt_data tunable, allowed the zfs send to continue, replacing the corrupted data.
However, when an corrupted blockpointer was encountered, the zfs send would return ECKSUM and stop.
This change allows the zfs send to continue, skipping over the unreadable objects.

Description

When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag, so traverse_visitbp() will not fail with ECKSUM if a blockpointer cannot be read, but rather will continue and send the objects it can.

How Has This Been Tested?

A corrupted pool was able to complete a 9 TB zfs send that would otherwise fail after only a few GB of data.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 8, 2021
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. LGTM.

@jwk404 jwk404 added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 9, 2021
@jwk404 jwk404 merged commit a68e4b5 into openzfs:master Sep 9, 2021
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 15, 2021
When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes openzfs#12541
rincebrain pushed a commit to rincebrain/zfs that referenced this pull request Sep 22, 2021
When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes openzfs#12541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants