Don't preload at or after final dirty txg #9216

pcd1193182 · 2019-08-26T22:41:37Z

Motivation and Context

See Issue #9186

Description

We prevent preloading at or after the final dirty txg

How Has This Been Tested?

Passed the zfs-test suite once, needs a few more runs to verify that it fixes the issue (hoping that the automated PR test runs will help).

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

module/zfs/metaslab.c

codecov · 2019-08-27T07:30:38Z

Codecov Report

Merging #9216 into master will decrease coverage by 6.3%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #9216      +/-   ##
==========================================
- Coverage   79.09%   72.79%   -6.31%     
==========================================
  Files         400      371      -29     
  Lines      122002   119432    -2570     
==========================================
- Hits        96498    86937    -9561     
- Misses      25504    32495    +6991

Flag	Coverage Δ
#kernel	`72.43% <100%> (-7.27%)`	⬇️
#user	`63.52% <100%> (-3.36%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 142f84d...e84d035. Read the comment docs.

behlendorf · 2019-08-27T15:59:44Z

In order to get a clean test run from the CI, it's probably going to be necessary to also cherry-pick the proposed fix from #9209 in to this PR.

If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being exported, we can enter a race condition that causes a kernel panic. If the metaslab hasn't been upgraded yet, we mark it for condensing. It can then be preloaded so that it can be condensed. If this happens during the final dirty txg, the metaslab will be dirtied by the condensing process and then will be sycned after the final dirty txg, causing the kernel panic. The solution is to forbid preloading metaslabs at or after the final dirty txg; this makes sense in any case, because we shouldn't need to dirty any more metaslabs at or after that point. Signed-off-by: Paul Dagnelie <pcd@delphix.com>

`metaslab_verify_weight_and_frag()` a verification function and by the end of it there shouldn't be any side-effects. The function calls `metaslab_weight()` which in turn calls `metaslab_set_fragmentation()`. The latter can dirty and otherwise not dirty metaslab fro the next TXGand set `metaslab_condense_wanted` if the spacemaps were just upgraded (meaning we just enabled the SPACEMAP_HISTOGRAM feature through upgrade). This patch ensures that metaslabs like these are skipped thus avoiding that problem. We could also get rid of that function completely but I hesitated because it has caught issues during development of other features in the past. Fixing this issue should also help with with most failures that issue openzfs#9186 has been causing to the test-bots recently. Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>

ahrens · 2019-08-28T15:12:41Z

module/zfs/metaslab.c

+	 * those metaslabs.
+	 */
+	if (msp->ms_sm->sm_dbuf->db_size != sizeof (space_map_phys_t))
+		return;


The comment explains this well, but this seems like this is exposing a poorly-designed corner of the code. Could we do something like pass a nodirty flag to metaslab_weight(), or have a metaslab_weight_impl(), which just recalculates the weight and doesn't check about condensing? That way, if metaslab_weight() decides to mutate the metaslab due to some other condition, the relevant code will be close by, and we won't have to add another check here.

realized this is actually part of #9209. copied my comment over there. This commit is good to go.

pcd1193182 · 2019-08-28T23:51:06Z

This PR is being closed because this fix doesn't do enough to solve the actual issue; another PR will be opened shortly with a better fix.

pcd1193182 requested review from behlendorf, ahrens and sdimitro August 26, 2019 22:41

behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 27, 2019

ahrens reviewed Aug 27, 2019

View reviewed changes

module/zfs/metaslab.c Outdated Show resolved Hide resolved

pcd1193182 and others added 2 commits August 27, 2019 09:39

pcd1193182 force-pushed the final_txg branch from bda0387 to e84d035 Compare August 27, 2019 16:41

behlendorf approved these changes Aug 27, 2019

View reviewed changes

ahrens reviewed Aug 28, 2019

View reviewed changes

ahrens approved these changes Aug 28, 2019

View reviewed changes

pcd1193182 closed this Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't preload at or after final dirty txg #9216

Don't preload at or after final dirty txg #9216

pcd1193182 commented Aug 26, 2019

codecov bot commented Aug 27, 2019 •

edited

Loading

behlendorf commented Aug 27, 2019

ahrens Aug 28, 2019

ahrens Aug 28, 2019

pcd1193182 commented Aug 28, 2019

Don't preload at or after final dirty txg #9216

Don't preload at or after final dirty txg #9216

Conversation

pcd1193182 commented Aug 26, 2019

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Aug 27, 2019 • edited Loading

Codecov Report

behlendorf commented Aug 27, 2019

ahrens Aug 28, 2019

Choose a reason for hiding this comment

ahrens Aug 28, 2019

Choose a reason for hiding this comment

pcd1193182 commented Aug 28, 2019

codecov bot commented Aug 27, 2019 •

edited

Loading