-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't preload at or after final dirty txg #9216
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9216 +/- ##
==========================================
- Coverage 79.09% 72.79% -6.31%
==========================================
Files 400 371 -29
Lines 122002 119432 -2570
==========================================
- Hits 96498 86937 -9561
- Misses 25504 32495 +6991
Continue to review full report at Codecov.
|
In order to get a clean test run from the CI, it's probably going to be necessary to also cherry-pick the proposed fix from #9209 in to this PR. |
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being exported, we can enter a race condition that causes a kernel panic. If the metaslab hasn't been upgraded yet, we mark it for condensing. It can then be preloaded so that it can be condensed. If this happens during the final dirty txg, the metaslab will be dirtied by the condensing process and then will be sycned after the final dirty txg, causing the kernel panic. The solution is to forbid preloading metaslabs at or after the final dirty txg; this makes sense in any case, because we shouldn't need to dirty any more metaslabs at or after that point. Signed-off-by: Paul Dagnelie <pcd@delphix.com>
`metaslab_verify_weight_and_frag()` a verification function and by the end of it there shouldn't be any side-effects. The function calls `metaslab_weight()` which in turn calls `metaslab_set_fragmentation()`. The latter can dirty and otherwise not dirty metaslab fro the next TXGand set `metaslab_condense_wanted` if the spacemaps were just upgraded (meaning we just enabled the SPACEMAP_HISTOGRAM feature through upgrade). This patch ensures that metaslabs like these are skipped thus avoiding that problem. We could also get rid of that function completely but I hesitated because it has caught issues during development of other features in the past. Fixing this issue should also help with with most failures that issue openzfs#9186 has been causing to the test-bots recently. Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
* those metaslabs. | ||
*/ | ||
if (msp->ms_sm->sm_dbuf->db_size != sizeof (space_map_phys_t)) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment explains this well, but this seems like this is exposing a poorly-designed corner of the code. Could we do something like pass a nodirty
flag to metaslab_weight(), or have a metaslab_weight_impl(), which just recalculates the weight and doesn't check about condensing? That way, if metaslab_weight() decides to mutate the metaslab due to some other condition, the relevant code will be close by, and we won't have to add another check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realized this is actually part of #9209. copied my comment over there. This commit is good to go.
This PR is being closed because this fix doesn't do enough to solve the actual issue; another PR will be opened shortly with a better fix. |
Motivation and Context
See Issue #9186
Description
We prevent preloading at or after the final dirty txg
How Has This Been Tested?
Passed the zfs-test suite once, needs a few more runs to verify that it fixes the issue (hoping that the automated PR test runs will help).
Types of changes
Checklist:
Signed-off-by
.