Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel in-progress rebuilds when we finish removal #13498

Merged
merged 1 commit into from
May 25, 2022

Conversation

pcd1193182
Copy link
Contributor

Motivation and Context

This issue was discovered by a zloop runs. When a mirror or other redundant top-level vdev has a disk failure, and the disk is replaced, the rebuild process occurs. A removal can happen while this is in progress. If the removal completes before the rebuild does, the removal process will try to free the vdev that is still in use.

Description

We cancel in-progress rebuilds at the end of the removal process, but before the vdev is freed.

How Has This Been Tested?

zloop runs, manual test with deliberately slowed rebuild

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix LGTM, but this kind of highlights an asymmetry in the existing behavior. A resilver cannot be started while a removal is in progress, however a removal can be started while resilvering. I wonder if we should make this consistent.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 24, 2022
@pcd1193182
Copy link
Contributor Author

@behlendorf It would probably be reasonable to make that change as well, preventing a removal from starting during a rebuild. If the removal isn't an accident, you probably don't want the added stress of a rebuild on the disk you're removing (and which might a failure hazard, since it already lost one vdev in its group).

@behlendorf
Copy link
Contributor

you probably don't want the added stress of a rebuild on the disk you're removing

Right. My guess would be this was originally allowed so you could still scrub the pool while removing. That's still extra stress on the disk, but at least you're not down any redundancy. I think we'd still want to allow this.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 25, 2022
@behlendorf behlendorf merged commit 7829b46 into openzfs:master May 25, 2022
@behlendorf
Copy link
Contributor

Merged. We can make the other change in a different PR if we decide it's worthwhile. In the meanwhile there's no reason to hold this up.

nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 12, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
beren12 pushed a commit to beren12/zfs that referenced this pull request Sep 19, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#13498
bsdjhb pushed a commit to CTSRD-CHERI/zfs that referenced this pull request Jul 13, 2023
This is not associated with a specific upstream commit but apparently
a local diff applied as part of:

commit e3aa18ad71782a73d3dd9dd3d526bbd2b607ca16
Merge: 645886d028c8 b9d9845
Author: Martin Matuska <mm@FreeBSD.org>
Date:   Fri Jun 3 17:58:39 2022 +0200

    zfs: merge openzfs/zfs@b9d98453f

    Notable upstream pull request merges:
      openzfs#12321 Fix inflated quiesce time caused by lwb_tx during zil_commit()
      openzfs#13244 zstd early abort
      openzfs#13360 Verify BPs as part of spa_load_verify_cb()
      openzfs#13452 More speculative prefetcher improvements
      openzfs#13466 Expose zpool guids through kstats
      openzfs#13476 Refactor Log Size Limit
      openzfs#13484 FreeBSD: libspl: Add locking around statfs globals
      openzfs#13498 Cancel in-progress rebuilds when we finish removal
      openzfs#13499 zed: Take no action on scrub/resilver checksum errors
      openzfs#13513 Remove wrong assertion in log spacemap

    Obtained from:  OpenZFS
    OpenZFS commit: b9d9845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants