-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix EBUSY errors under overlayfs and v4.13+ kernels #34948
Conversation
cc @rhvgoyal as the author of #22069, which this reverts, and @tonistiigi as the author of #27609, which I think makes the change in #22069 unnecessary. |
77f0db1
to
5c3333d
Compare
5c3333d
to
699fab4
Compare
The powerpc failure is a flake. I don't know how to kick the jenkins to re-run |
@euank Looks like you forgot to sign-off the second commit 😅 also ping @dmcgowan @rhvgoyal @tonistiigi PTAL |
8484f90
to
fcd2baa
Compare
cc @amir73il Ok, so kernel commit Secondly, even if you make Making it I feel we can try to minimize mount point leaks but being able to take care of all corner cases will be very hard. So I am wondering if a solution should come from kernel side instead. I am wondering will it make sense to convert kernel error message to a warning message? Or is there a way where kernel can figure out that we are essentially using same set of lower/upper/work directories and instead of instantiating a new super block, it re-uses existing super block (which is around due to busy mount). Amir will know much more, I am ccing him. |
@rhvgoyal overlayfa is right to complain, so kernel warning is not good enough. @tonistiigi The root cause of the problem IMO is the chroot() does not Unmount
|
That's not true. I can reliably reproduce a mount point leak to runc (leaked for 100ms to 1s) without this patch. With this patch I can't. It therefore still makes sense to try and avoid it.
Because the code here uses MNT_DETACH as of #33638 we're opting in to async behavior either way. I think that PR is technically wrong, but that's something to address separate from this.
False. runc makes the mountpoints rslave. See either my original issue I say this fixes and my discussion there or my pr against runc and the referenced defaulting code.
I disagree. If they do
A better solution would be something like @cyphar suggests imo.
A good first step there would be to not do "mount umount mount umount mount" to start a container, but instead just mount it once and leave it mounted.
Read my original issue (#34672) where I talk about the actual issue I'm trying to solve. There may be more issues, but the issue I'm trying to fix does not actually use any of that code so it's not closely related to what I see as the root problem. I agree the things you pointed out are problems (though I think the solution is making it |
Yeh I am confused of all the different issues and fixes. Anyway, the solution of rslave and lazy umount in cloned ns looks good. |
This PR is for fixing something else though. This is to fix the case where mounting Are you willing to review this change, or do you have specific questions about this issue that isn't answered already by my comments in #34672? |
I can only ACK the change that doesn't make home dir private. I don't know enough on the bug picture to say if var/lib/docker/overlay is always shared to begin with. I suppose if makes little sense to make home dir private for other graph drivers? And the rslave change should be made to chroot() as well. Otherwise this fix is correct but partial. You shouldn't fix just the problem you see in front of you when you know there is a bigger issue to solve. But fine with me of this is the first step. |
@amir73il I apologize if I came off a little roughly. The internet does make it hard to convey tone properly.
Thanks! What I'm looking for here is making sure this change is in the right direction
I think it generally will be, especially in a systemd world. It's possible this should be swapping
It might, but I'm not an expert. I do know there are some related issues floating around for devicemapper at the least.
Yup, happy to look at that and changing the MNT_DETACH bit later. One step at a time. |
Review bump for this change |
@amir73il Thinking more about it, it feels like regression to me (from userspace point of view). Sure, we can try modifying user space to not leak mount points, but kernel upgrade will still break existing container runtime. And making sure setup is completely right and none of the mounts point are leaking is hard. So if I take a device IMHO, either we need to implement same behavior for overlay or reduce the level from error to warning. And this is irrespective of docker changes. Sure try to reduce the amount of leaked mount points, that can only help. But it still feels like a regression. |
@amir73il In some ways it feels like DOS to me. If an overlay mount point has leaked somewhere, then root can not create a new overlay mount point and get to its data. Terminal 1
Terminal 2
Terminal 1
Now root can't access its own data due to leaked mount point in some process mount namespace. And that sounds not desirable to me (given how easy it is to leak mount points). |
@rhvgoyal wrt regression you are definitely right. This should be fixed in stable kernels. |
cc @kolyshkin |
In my limited testing, this patch indeed helps with EBUSY on /merged on removal, which I can reproduce easily on a RHEL 7.4 system with This is not ultimate fix though as I still see occasional EBUSY on /shm removal (which is somewhat |
@kolyshkin For the reasons described in #34672, I also had to apply this runc patch to fix my issue: opencontainers/runc#1598 Is there a separate issue or bugzilla filed for the Also, review ping! It seems like this PR has stalled out, but I don't think I've gotten any actionable requests / reasons why it's stalled. @dmcgowan and @rhvgoyal, want to review this change? Even if overlayfs lets us get away with leaking mounts, I don't think we should be and so IMO this should still be merged. |
Intentionally leaving the I think I've addressed or replied to the review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a tiny nitpick
} | ||
if rmErr := os.RemoveAll(mergedDir); rmErr != nil { | ||
logrus.Warnf("Failed to remove %s: %v: %v", id, rmErr, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/craeted/created/
@@ -546,8 +543,9 @@ func (d *Driver) Get(id, mountLabel string) (_ containerfs.ContainerFS, retErr e | |||
if mntErr := unix.Unmount(mergedDir, 0); mntErr != nil { | |||
logrus.Errorf("error unmounting %v: %v", mergedDir, mntErr) | |||
} | |||
if rmErr := os.RemoveAll(mergedDir); rmErr != nil { | |||
logrus.Warnf("Failed to remove %s: %v: %v", id, rmErr, err) | |||
// Cleanup the craeted merged directory; see the comment in Put's rmdir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Can you please rebase it? |
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
This removes and recreates the merged dir with each umount/mount respectively. This is done to make the impact of leaking mountpoints have less user-visible impact. It's fairly easy to accidentally leak mountpoints (even if moby doesn't, other tools on linux like 'unshare' are quite able to incidentally do so). As of recently, overlayfs reacts to these mounts being leaked (see One trick to force an unmount is to remove the mounted directory and recreate it. Devicemapper now does this, overlay can follow suit. Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
rebased, addressed nits, autosquashed |
From the Z run:
Rebuilding. |
LGTM. |
ping @kolyshkin PTAL - nits were addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM (disclaimer: I'm not a maintainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks all for reviewing!
Looks like this did not land in 17.11? I had to apply the patch manually, here's a slightly modified version which applies correctly on 17.11 in case anyone else needs this. |
No it's not part of 17.11, will be in 17.12 |
This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.
It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).
As of recently, overlayfs reacts to these mounts being leaked (see
One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.
- Description for the changelog
Fix upperdir in use warnings under overlayfs and v4.13+ kernels
- A picture of a cute animal (not mandatory but encouraged)