-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disabled resilver_defer feature leads to looping resilvers #9299
Comments
thanks for update!
|
Thanks for digging in to this, the proposed fix makes sense to me but let's get @tcaputi's thoughts on it. It would be nice to add a test to the ZTS which covers this case. |
I agree, a test would be great. Let me know if you have any thoughts about how we can do this sort of regression testing in an automated fashion. I don't think we're set up well for this in illumos yet. |
I'd suggest adapting one of the existing ZTS redundancy tests for this or just writing a new one. To automate this I'd suggest using |
@behlendorf you can't reproduce this issue with new code and try to disable feature in creation time. for reproduce of this issue you have to create a pool based on old zfs code and load it to system with new zfs code with resilver_defer feature available. it is my experience with this issue, but maybe Kody has another examples. |
I see, I was under the impression the feature merely needed to be disabled. In the past we've handled this on a case-by-case basis by including a tiny pool created with the relevant code. |
I believe that the proposed fix should solve the problem and is correct. Thanks @KodyKantor for the help here. |
Great, thanks. I'll see if I can put a test together for this as well. |
@KodyKantor how has testing for this come along? We would like to start using this patch if its correct. |
Testing has been mixed. It appears Igor may be correct about how using the {{-d}} flag prevents the issue from occuring, but I'm not sure why that is. I also haven't been able to make a pool small enough to check in. I'll get back to trying to write a good test for this tomorrow morning. If I can't figure something out by mid-day I'll just submit a PR as-is. |
When a disk is replaced with another on a pool with the resilver_defer feature present, but not enabled the resilver activity restarts during each spa_sync. This patch checks to make sure that the resilver_defer feature is first enabled before requesting a deferred resilver. This was originally fixed in illumos-joyent as OS-7982. Closes openzfs#9299. Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Kody A Kantor <kody@kkantor.com>
I submitted a PR for this. I didn't have time to get back to trying to write a test for this, sorry about that. I'll be away from my computer until September 30th, but by all means feel free to change this or integrate it in the meantime. If this is still up on the 30th I'll continue pushing this through. |
i didn't test update in PR, i have prepared steps how to reproduce resilver loop.
i can see resilver in loop - not finished |
just tested proposed update based on TritonDataCenter/illumos-joyent@b67d873 - it fixed this resilvering loop. |
When a disk is replaced with another on a pool with the resilver_defer feature present, but not enabled the resilver activity restarts during each spa_sync. This patch checks to make sure that the resilver_defer feature is first enabled before requesting a deferred resilver. This was originally fixed in illumos-joyent as OS-7982. Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Kody A Kantor <kody@kkantor.com> External-issue: illumos-joyent OS-7982 Closes #9299 Closes #9338
When a disk is replaced with another on a pool with the resilver_defer feature present, but not enabled the resilver activity restarts during each spa_sync. This patch checks to make sure that the resilver_defer feature is first enabled before requesting a deferred resilver. This was originally fixed in illumos-joyent as OS-7982. Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Kody A Kantor <kody@kkantor.com> External-issue: illumos-joyent OS-7982 Closes openzfs#9299 Closes openzfs#9338
When a disk is replaced with another on a pool with the resilver_defer feature present, but not enabled the resilver activity restarts during each spa_sync. This patch checks to make sure that the resilver_defer feature is first enabled before requesting a deferred resilver. This was originally fixed in illumos-joyent as OS-7982. Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Kody A Kantor <kody@kkantor.com> External-issue: illumos-joyent OS-7982 Closes #9299 Closes #9338
When a disk is replaced with another on a pool with the resilver_defer feature present, but not enabled the resilver activity restarts during each spa_sync. This patch checks to make sure that the resilver_defer feature is first enabled before requesting a deferred resilver. This was originally fixed in illumos-joyent as OS-7982. Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Kody A Kantor <kody@kkantor.com> External-issue: illumos-joyent OS-7982 Closes openzfs#9299 Closes openzfs#9338
System information
smartos 20190718T005708Z
Describe the problem you're observing
When a disk is replaced with another on a pool with the resilver_defer feature present, but no enabled the resilver activity restarts during each spa_sync.
Describe how to reproduce the problem
The problem is described further here: https://smartos.org/bugview/OS-7982
A patch is available for SmartOS: TritonDataCenter/illumos-joyent@b67d873
After soaking in SmartOS for a short time the change will be upstreamed (including more code review) to illumos-gate as well.
This problem is also briefly described recently in #840, notably in @chrisrd's comment. This merits a separate ticket since the original issue reported in #840 is separate from this. If there's already a ticket open about this then we can close this ticket.
As noted in #840 upgrading the pool is a workaround for this issue for those willing to do so.
The text was updated successfully, but these errors were encountered: