-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ztest: segfault when hitting race in metaslab_enable #8602
Comments
This might still be the case with the new B-tree, but it seems somewhat obfusticated:
|
i see on dilos time to time similar issue too:
|
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. TEST_ZTEST_TIMEOUT=7200 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8602
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. TEST_ZTEST_TIMEOUT=7200 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8602
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. TEST_ZTEST_TIMEOUT=7200 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8602
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8602 Closes #9751
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8602 Closes openzfs#9751
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8602 Closes openzfs#9751
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8602 Closes openzfs#9751
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8602 Closes #9751
Reopening, this issue has not been entirely resolved. and has been observed in the latest code from June. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Lets not stale this one... |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Closing. This was resolved by 793c958. |
System information
Describe the problem you're observing
After running ztest over the weekend, I hit a segfault:
I was able to verify that the segfault occurred because the metaslab's
ms_group
was NULL:This implies that we were trying to initialize a vdev that had an unpopulated metaslab. In other cases, we seen issues with races between initialize and removal, so I looked at the other stacks and saw a removal was in progress:
The pointer to the vdev being initialized is
0x55d7cacc0050
and the pointer to the one being removed is0x55d7cab7d000
. Looking at the vdev configuration from the pool, we can see that the one being initialized belongs to the second mirror (which is the one we're trying to remove):So it seems we're hitting a race where we initialize a vdev while we're removing it's parent.
The text was updated successfully, but these errors were encountered: