-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metaslab_verify_weight_and_frag() can set ms_condense_wanted and dirty the metaslab as a side-effect #9185
Labels
Type: Defect
Incorrect behavior (e.g. crash, hang)
Comments
behlendorf
added a commit
to behlendorf/zfs
that referenced
this issue
Aug 28, 2019
Until issues openzfs#9185 and openzfs#9186 have been resolved the following zpool upgrade tests are being disabled to prevent CI failures. zpool_upgrade_002_pos, zpool_upgrade_003_pos, zpool_upgrade_004_pos, zpool_upgrade_007_pos, zpool_upgrade_008_pos Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
12 tasks
behlendorf
added a commit
that referenced
this issue
Aug 28, 2019
Until issues #9185 and #9186 have been resolved the following zpool upgrade tests are being disabled to prevent CI failures. zpool_upgrade_002_pos, zpool_upgrade_003_pos, zpool_upgrade_004_pos, zpool_upgrade_007_pos, zpool_upgrade_008_pos Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #9185 Issue #9186 Closes #9225
This issue shouldn't be closed as this unexpected side-effect still exists |
tonyhutter
pushed a commit
to tonyhutter/zfs
that referenced
this issue
Dec 24, 2019
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being exported, we can enter a situation that causes a kernel panic. Any metaslabs that are loaded during the final dirty txg and haven't already been condensed will cause metaslab_sync to proceed after the final dirty txg so that the condense can be performed, which there are assertions to prevent. Because of the nature of this issue, there are a number of ways we can enter this state. Rather than try to prevent each of them one by one, potentially missing some edge cases, we instead cut it off at the point of intersection; by preventing metaslab_sync from proceeding if it would only do so to perform a condense and we're past the final dirty txg, we preserve the utility of the existing asserts while preventing this particular issue. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#9185 Closes openzfs#9186 Closes openzfs#9231 Closes openzfs#9253
tonyhutter
pushed a commit
to tonyhutter/zfs
that referenced
this issue
Dec 27, 2019
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being exported, we can enter a situation that causes a kernel panic. Any metaslabs that are loaded during the final dirty txg and haven't already been condensed will cause metaslab_sync to proceed after the final dirty txg so that the condense can be performed, which there are assertions to prevent. Because of the nature of this issue, there are a number of ways we can enter this state. Rather than try to prevent each of them one by one, potentially missing some edge cases, we instead cut it off at the point of intersection; by preventing metaslab_sync from proceeding if it would only do so to perform a condense and we're past the final dirty txg, we preserve the utility of the existing asserts while preventing this particular issue. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#9185 Closes openzfs#9186 Closes openzfs#9231 Closes openzfs#9253
tonyhutter
pushed a commit
that referenced
this issue
Jan 23, 2020
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being exported, we can enter a situation that causes a kernel panic. Any metaslabs that are loaded during the final dirty txg and haven't already been condensed will cause metaslab_sync to proceed after the final dirty txg so that the condense can be performed, which there are assertions to prevent. Because of the nature of this issue, there are a number of ways we can enter this state. Rather than try to prevent each of them one by one, potentially missing some edge cases, we instead cut it off at the point of intersection; by preventing metaslab_sync from proceeding if it would only do so to perform a condense and we're past the final dirty txg, we preserve the utility of the existing asserts while preventing this particular issue. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #9185 Closes #9186 Closes #9231 Closes #9253
allanjude
pushed a commit
to KlaraSystems/zfs
that referenced
this issue
Apr 28, 2020
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being exported, we can enter a situation that causes a kernel panic. Any metaslabs that are loaded during the final dirty txg and haven't already been condensed will cause metaslab_sync to proceed after the final dirty txg so that the condense can be performed, which there are assertions to prevent. Because of the nature of this issue, there are a number of ways we can enter this state. Rather than try to prevent each of them one by one, potentially missing some edge cases, we instead cut it off at the point of intersection; by preventing metaslab_sync from proceeding if it would only do so to perform a condense and we're past the final dirty txg, we preserve the utility of the existing asserts while preventing this particular issue. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#9185 Closes openzfs#9186 Closes openzfs#9231 Closes openzfs#9253
allanjude
pushed a commit
to KlaraSystems/zfs
that referenced
this issue
Apr 28, 2020
`metaslab_verify_weight_and_frag()` a verification function and by the end of it there shouldn't be any side-effects. The function calls `metaslab_weight()` which in turn calls `metaslab_set_fragmentation()`. The latter can dirty and otherwise not dirty metaslab fro the next TXGand set `metaslab_condense_wanted` if the spacemaps were just upgraded (meaning we just enabled the SPACEMAP_HISTOGRAM feature through upgrade). This patch adds a new flag as a parameter to `metaslab_weight()` and `metaslab_set_fragmentation()` making the dirtying of the metaslab optional. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes openzfs#9185 Closes openzfs#9282
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
metaslab_verify_weight_and_frag()
is just a verification function and by the end of it there shouldn't be any side-effects.The function calls
metaslab_weight()
which in trun callsmetaslab_set_fragmentation()
. The latter can setmetaslab_condense_wanted
to true and dirty the metaslab for the next TXG if the spacemaps were just upgraded (meaning we had theSPACEMAP_HISTOGRAM
feature disabled and then we enabled it as part ofzpool upgrade
).We should find a way to ensure that these mutations do not happen as part of the verification or they are reset by the time the verification function is done. We could even try to get rid of this code (or at least some of it) but I'd personally be a bit hesitant due to the bugs that its checks have found in the past.
The text was updated successfully, but these errors were encountered: