Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undirty removed spill blocks. #2884

Closed
wants to merge 1 commit into from
Closed

Conversation

dweeezil
Copy link
Contributor

If a spill block's dbuf hasn't yet been written when a the spill block
is removed, the unwritten version will still be written. This patch
undirties the dbuf in this case to prevent it to be written.

The most common case in which this could happen is when xattr=sa is being
used and a long xattr is immediately replaced by a short xattr as in:

setfattr -n user.test -v very_very_very..._long_value  <file>
setfattr -n user.test -v short_value  <file>

The first value must be sufficiently long that a spill block is generated
and the second value must be short enough to not require a spill block.
In practice, this would typically happen due to internal xattr operations
as a result of setting acltype=posixacl.

@dweeezil
Copy link
Contributor Author

This should fix the recently reported cases of dnode corruption which typically occur when xattr=sa and acltype=posixacl are set:
#2663
#2700
#2717
#2701
#2863
and maybe others.

@dweeezil
Copy link
Contributor Author

A little more explanation is likely in order because this problem doesn't necessarily directly cause the problems seen in the referenced issues. This problem will cause an obsolete version of an SA (typically an SA xattr) to be stored in a spill block simultaneously with a newer version of the same SA in the dnode (which ends up corrupted). This corruption is often harmless and manifests itself as (typically the) SA xattr being "invisible". Subsequent modifications to the dnode, however, will generally lead to more severe corruption and it's these latter stages that appear to have happened in some cases.

@b333z
Copy link
Contributor

b333z commented Nov 11, 2014

wow, amazing work @dweeezil...

@tomposmiko
Copy link

+1, amazing work

@dweeezil
What can we do with already corrupted files?
Should we just remove them, or scrub will fix after applying the patch, or do we need to recreate the filesystem?

@dweeezil
Copy link
Contributor Author

@tomposmiko If you can remove the corrupted files, you should be OK. Unfortunately, there are a number of different types of corruption resulting from follow-on access to the bug. During my testing, I was able to create corrupted filesystems which in some cases would cause the entire pool to become un-importable and in other cases would cause the filesystem to be un-deletable. I'd say that if you can remove the corrupted files, you should be pretty safe with the possible fallout being the potential of a bit of leaked space.

Due to my inability to recreate the problem as well as the number of problem reports, I had started working on a gross hack to work around the exact type of corruption seen in #2863. One of my hopes was that working on a workaround would give me an idea as to the nature of the problem. That is, in fact, exactly what happened. I don't plan on working on the workaround any more because I'm not convinced it can be implemented very reliably.

@dweeezil
Copy link
Contributor Author

@behlendorf Yes, that's a better structure. I was so happy to get to the bottom of this problem that I kinda just jammed it into the "early return" part because I knew it would work there and had intended on looking into a cleaner implementation once the overall concept got some review.

@behlendorf
Copy link
Contributor

@dweeezil I completely understand! The buildbot was done for a few days but it's back online now and should get your patch tested in the next day or so. If you're happy with the suggested restructuring you could refresh the pull request after giving it a good review to make sure I didn't overlook something.

@dweeezil
Copy link
Contributor Author

I've restructured the code as suggested by @behlendorf. Interestingly, a version of the code in which I had introduced a lot of instrumentation, I did use a flag to indicate whether a spill was being freed but I called it freespill. In homage to that debugging code, I've used the same name. This patch does pass my test case.

If a spill block's dbuf hasn't yet been written when a spill block is
freed, the unwritten version will still be written.  This patch handles
the case in which a spill block's dbuf is freed and undirties it to
prevent it from being written.

The most common case in which this could happen is when xattr=sa is being
used and a long xattr is immediately replaced by a short xattr as in:

	setfattr -n user.test -v very_very_very..._long_value  <file>
	setfattr -n user.test -v short_value  <file>

The first value must be sufficiently long that a spill block is generated
and the second value must be short enough to not require a spill block.
In practice, this would typically happen due to internal xattr operations
as a result of setting acltype=posixacl.
@behlendorf
Copy link
Contributor

This looks good to me. I like freespill better as a variable name too, isspill was hard to read. Once the buildbot has finished testing this change I'll get it merged.

@behlendorf
Copy link
Contributor

Thanks everybody for getting to the bottom of this. It's been merged to master as:

4254acb Undirty freed spill blocks.

ryao pushed a commit to ryao/zfs that referenced this pull request Nov 29, 2014
If a spill block's dbuf hasn't yet been written when a spill block is
freed, the unwritten version will still be written.  This patch handles
the case in which a spill block's dbuf is freed and undirties it to
prevent it from being written.

The most common case in which this could happen is when xattr=sa is being
used and a long xattr is immediately replaced by a short xattr as in:

	setfattr -n user.test -v very_very_very..._long_value  <file>
	setfattr -n user.test -v short_value  <file>

The first value must be sufficiently long that a spill block is generated
and the second value must be short enough to not require a spill block.
In practice, this would typically happen due to internal xattr operations
as a result of setting acltype=posixacl.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#2663
Closes openzfs#2700
Closes openzfs#2701
Closes openzfs#2717
Closes openzfs#2863
Closes openzfs#2884

Conflicts:
	module/zfs/dbuf.c
behlendorf pushed a commit that referenced this pull request Dec 23, 2014
If a spill block's dbuf hasn't yet been written when a spill block is
freed, the unwritten version will still be written.  This patch handles
the case in which a spill block's dbuf is freed and undirties it to
prevent it from being written.

The most common case in which this could happen is when xattr=sa is being
used and a long xattr is immediately replaced by a short xattr as in:

	setfattr -n user.test -v very_very_very..._long_value  <file>
	setfattr -n user.test -v short_value  <file>

The first value must be sufficiently long that a spill block is generated
and the second value must be short enough to not require a spill block.
In practice, this would typically happen due to internal xattr operations
as a result of setting acltype=posixacl.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2663
Closes #2700
Closes #2701
Closes #2717
Closes #2863
Closes #2884
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants