-
Notifications
You must be signed in to change notification settings - Fork 882
Conversation
Fixes #3181, right, because |
@euank I think so, but have to go and double-check many similar bugs. I was using coreos/bugs#1630 (comment) as a testcase, which is close to a pathological worse case as it does:
|
This fixes multiple issues in pods GC. In particular, rkt now tries harder to clean its environment by: * looping over all pods, even if one of them is faulty * leaving manifest removal as last step, as next GC runs may succeed * chroot-ing to pod before cleaning, to avoid chasing wrong symlinks * chdir-ing to (host) root, to avoid keeping directories busy * loop-unmounting all mountpoints, even if one umount syscall fails * detecting busy mounts, and marking them as detached for lazy unmount
1fd5d6b
to
b148180
Compare
if err := os.RemoveAll(p.Path()); err != nil { | ||
stderr.PrintE(fmt.Sprintf("unable to remove pod %q", p.UUID), err) | ||
os.Exit(254) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return an err so the exit code can still get set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes sense. I will check if this can be properly fitted at the caller sites.
return fmt.Errorf("error chroot-ing to %q: %s", targetPath, err) | ||
} | ||
defer syscall.Chroot(".") | ||
defer syscall.Fchdir(rootFd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have a central chroot
facility? I did the same thing here https://github.com/coreos/rkt/pull/3490/files#diff-0eb4fc0b150ea176f98a75960dd51bceR475?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense for DRY, however I'm a bit worried that introducing wrappers here will only make it harder to follow. Also your chroot helper from there bails out on errors but here I'd prefer to defer-clean unconditionally.
// outside of this rootfs - in descending nest order (parent first) | ||
// 5. unmount all mount targets - in ascending nest order (children first). | ||
// If unmount fails, lazy-detach the mount target so that the kernel can | ||
// still clean it up once it ceases to be busy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice
@euank I'm now reporting a boolean success status, as it was already expected by rm. For gc, it looks like a bit of rework is needed on |
I've re-triggered the CI but this should be just the usual flake captured in #3477. This should be ready to go otherwise, PTAL. |
fwiw I did a cursory review of this the other week and it lgtm |
@jonboulle thanks for the postumous feedback, I also got and additional double-check offline before merging. |
rkt has a gnarly bug (rkt/rkt#3181) that won't be fixed in a hurry (rkt/rkt#3486). It least to continuous task failures that eventually totally wreak worker nodes (kubernetes-retired#244). In the meantime we can use docker just as easily for this simple task. This work around was discussed in kubernetes-retired#199.
will it be feasible to add flag to NOT unmount lazily or even consider removing lazy unmount all together? I'd like to have explicit errors in case something is holding resources. the fact that |
Care to elaborate? If I read this correctly, your usecase would benefit from a |
@lucab I was referring to
This bug possibly together with other bugs managed to create 200k entries in |
You are right, but that huge amount of mounts is in the first place because of other rkt pods being nested or not being properly cleaned up, and shared propagation being in place between them. After this, as the GC pass progress, those will be untangled pod after pod. The lazy part in here is needed to get out of the deadlock without a node reboot or forced unmounts. |
once you got out of deadlock what state system will be? my understanding is that it just sweeps it under the carpet and no real unmounts happening. If lazy unmount enables kernel to resolve cyclic dependencies or something like that, which makes it do real unmounts, then I am happy with it. |
That's the rationale for this change, see #3465 for a better explanation of the case at hand. |
rkt has a gnarly bug (rkt/rkt#3181) that won't be fixed in a hurry (rkt/rkt#3486). It least to continuous task failures that eventually totally wreak worker nodes (kubernetes-retired#244). In the meantime we can use docker just as easily for this simple task. This work around was discussed in kubernetes-retired#199.
This fixes multiple issues in pods GC. In particular, rkt now tries
harder to clean its environment by:
MNT_DETACH
)