Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't preload at or after final dirty txg #9216

Closed
wants to merge 2 commits into from

Conversation

pcd1193182
Copy link
Contributor

Motivation and Context

See Issue #9186

Description

We prevent preloading at or after the final dirty txg

How Has This Been Tested?

Passed the zfs-test suite once, needs a few more runs to verify that it fixes the issue (hoping that the automated PR test runs will help).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 27, 2019
module/zfs/metaslab.c Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 27, 2019

Codecov Report

Merging #9216 into master will decrease coverage by 6.3%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9216      +/-   ##
==========================================
- Coverage   79.09%   72.79%   -6.31%     
==========================================
  Files         400      371      -29     
  Lines      122002   119432    -2570     
==========================================
- Hits        96498    86937    -9561     
- Misses      25504    32495    +6991
Flag Coverage Δ
#kernel 72.43% <100%> (-7.27%) ⬇️
#user 63.52% <100%> (-3.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 142f84d...e84d035. Read the comment docs.

@behlendorf
Copy link
Contributor

In order to get a clean test run from the CI, it's probably going to be necessary to also cherry-pick the proposed fix from #9209 in to this PR.

pcd1193182 and others added 2 commits August 27, 2019 09:39
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being
exported, we can enter a race condition that causes a kernel panic. If the
metaslab hasn't been upgraded yet, we mark it for condensing. It can then be
preloaded so that it can be condensed. If this happens during the final dirty
txg, the metaslab will be dirtied by the condensing process and then will be
sycned after the final dirty txg, causing the kernel panic. The solution is to
forbid preloading metaslabs at or after the final dirty txg; this makes sense
in any case, because we shouldn't need to dirty any more metaslabs at or after
that point.

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
`metaslab_verify_weight_and_frag()` a verification function and
by the end of it there shouldn't be any side-effects.

The function calls `metaslab_weight()` which in turn calls
`metaslab_set_fragmentation()`. The latter can dirty and otherwise
not dirty metaslab fro the next TXGand set `metaslab_condense_wanted`
if the spacemaps were just upgraded (meaning we just enabled the
SPACEMAP_HISTOGRAM feature through upgrade).

This patch ensures that metaslabs like these are skipped thus
avoiding that problem. We could also get rid of that function
completely but I hesitated because it has caught issues during
development of other features in the past.

Fixing this issue should also help with with most failures that
issue openzfs#9186 has been causing to the test-bots recently.

Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
* those metaslabs.
*/
if (msp->ms_sm->sm_dbuf->db_size != sizeof (space_map_phys_t))
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment explains this well, but this seems like this is exposing a poorly-designed corner of the code. Could we do something like pass a nodirty flag to metaslab_weight(), or have a metaslab_weight_impl(), which just recalculates the weight and doesn't check about condensing? That way, if metaslab_weight() decides to mutate the metaslab due to some other condition, the relevant code will be close by, and we won't have to add another check here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realized this is actually part of #9209. copied my comment over there. This commit is good to go.

@pcd1193182
Copy link
Contributor Author

This PR is being closed because this fix doesn't do enough to solve the actual issue; another PR will be opened shortly with a better fix.

@pcd1193182 pcd1193182 closed this Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants