-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network plugin not informed of missing pods #14940
Comments
@kubernetes/rh-networking |
That makes sense to me, although I guess the plugin could also just assume that if it gets a setUp on a pod it thinks is already running, it should tearDown the old network state first. |
We have a few edge cases we should consider holistically. We'll need to Case1: Case 2: Case 3: Case 4: Case 5: Case 6: In case 6 we have an analogous problem in volumes. we don't have a pod This bug in particular seems to be about cases 3 and 4, which have enough I am out today, but I wanted to weigh in just a bit. On Wed, Oct 7, 2015 at 7:54 AM, Dan Winship notifications@github.com
|
The network plugin isn't necessarily going to have the state necessary to make that decision, since it is only informed of the ID for the new pod, and has no way to connect that to the old pod that is no longer running. |
I have been thinking about a solution to this issue. At first, I was hoping there would be a simple fix. I found that if the pod infra container is removed and there are other containers that belong to that pod still up, we can extract the pod infra container ID (dockerID) from the ResovleConfPath, an attribute of the Container object created by the go-dockerclient, of the remaining containers. The problem with this solution is that it only solves cases 3 and 4, and not for cases 5 and 6, where all of the pod containers are completely destroyed. A possible solution that may solve all cases is creating a new field on the PodStatus API object called @thockin does this seem like a reasonable approach? I wanted your thoughts before I went ahead and started to implement the fix. |
It looks like something similar happens in my #18967 but for exec network plugin. Just leave it here |
What's the status on this? It's 6 months old - is it still valid? |
I believe this issue is still valid, but I'm not aware of anyone working on it. For Calico we've been able to work around it for the cases we were seeing (3 and 4 above). I haven't tested 5 and 6 personally. |
We are basically looking at reconciliation and disaster recovery. Currently, network plugins only has a bunch of hooks where kubelet can trigger. Need a sync loop for reconciliation or some sort. Also, we are dealing with multiple runtimes. I believe the new runtime interface can simply things. Need major rework to make this right. |
If we want to do reconciliation, we have to know the current state. The best way I can think of is thru a cni Status interface. For |
That assumes that kubernetes knows all of the state that the plugin cares about. In OpenShift, our plugin has an "Update" method and we just call it on all pods when a reconciliation is needed (ie, when kubelet is [re]started). (We don't currently deal with the missing pod problem.) |
This is pretty icky. Load a node up with pods The node doesn't seem to recover, and we haven't documented resuscitation. |
Hmm, maybe we can't docker inspect the container, so we return without invoking tearDown? https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/docker_manager.go#L1428 |
Also I think there are different cases we're conflating into this bug:
|
@bprashanth you can't always do that, because it's IPAM method dependent. You may need to send a DHCP release or some other operation that requires the netns around. I suppose we could amend the CNI spec to say that the IPAM plugin should release the IP without CNI_NETNS if it can, but there are certainly edge cases where that's not possible.
Whether we can clean up correctly by storing details (like netns) somewhere so that we can tear down networking when the container is dead and kubelet restarts depends on whether the namespace is mounted somewhere. IIRC, with CNI/kubenet+docker creates the netns and will clean up after it when the infra container goes away. So if the container goes away while kubelet is dead, we cannot get the netns on kubelet restart and we may not be able to cleanly release IPAM. With CNI/kubenet+rkt the namespace is bind-mounted and will not be removed until explicitly removed by kubenet or GC-ed by the rkt runtime somehow. Since netns creation is under control of the rkt runtime I think we can guarantee that IPAM release happens cleanly. |
There were 2 problems when this issue was initially filed:
The state of things today is a little different:
I agree with the previous comment, we now need to either remember netns somewhere, reverse engineer it with information available on exited containers (i.e not pid), or release resources without it. |
@bprashanth how is the second (1) "kubelet will deliver a teardown even for each infra container" fixed? Can you point me to a commit that did that? |
Also, in CNI upstream we're discussing adding language that DEL should be best-effort and that even if the netns isn't present, the plugin should still clean up whatever it can including IPAM. That would get rid of a couple of blocks here, and I think is appropriate. The major problem people have is likely IPAM leases not being GC-ed when the infra pod is gone. |
The docker runtime doesn't tear down networking when GC-ing pods. rkt already does so make docker do it too. To ensure this happens, infra pods are now always GC-ed rather than gating them by containersToKeep. This prevents IPAM from leaking when the pod gets killed for some reason outside kubelet (like docker restart) or when pods are killed while kubelet isn't running. Fixes: kubernetes#14940 Related: kubernetes#35572
The docker runtime doesn't tear down networking when GC-ing pods. rkt already does so make docker do it too. To ensure this happens, networking is always torn down for the container even if the container itself is not deleted. This prevents IPAM from leaking when the pod gets killed for some reason outside kubelet (like docker restart) or when pods are killed while kubelet isn't running. Fixes: kubernetes#14940 Related: kubernetes#35572
The docker runtime doesn't tear down networking when GC-ing pods. rkt already does so make docker do it too. To ensure this happens, networking is always torn down for the container even if the container itself is not deleted. This prevents IPAM from leaking when the pod gets killed for some reason outside kubelet (like docker restart) or when pods are killed while kubelet isn't running. Fixes: kubernetes#14940 Related: kubernetes#35572
The docker runtime doesn't tear down networking when GC-ing pods. rkt already does so make docker do it too. To ensure this happens, networking is always torn down for the container even if the container itself is not deleted. This prevents IPAM from leaking when the pod gets killed for some reason outside kubelet (like docker restart) or when pods are killed while kubelet isn't running. Fixes: kubernetes#14940 Related: kubernetes#35572
Automatic merge from submit-queue (batch tested with PRs 40505, 34664, 37036, 40726, 41595) dockertools: call TearDownPod when GC-ing infra pods The docker runtime doesn't tear down networking when GC-ing pods. rkt already does so make docker do it too. To ensure this happens, infra pods are now always GC-ed rather than gating them by containersToKeep. This prevents IPAM from leaking when the pod gets killed for some reason outside kubelet (like docker restart) or when pods are killed while kubelet isn't running. Fixes: #14940 Related: #35572
The docker runtime doesn't tear down networking when GC-ing pods. rkt already does so make docker do it too. To ensure this happens, networking is always torn down for the container even if the container itself is not deleted. This prevents IPAM from leaking when the pod gets killed for some reason outside kubelet (like docker restart) or when pods are killed while kubelet isn't running. Fixes: kubernetes#14940 Related: kubernetes#35572
I've noticed a case where the network plugin is not informed of pods which have gone missing. Specifically, if Docker is restarted (or the node rebooted), any containers which had been running will be stopped. The kubelet will notice this and re-create the pod infra container.
However, the network plugin will not have any knowledge that the previous infra container / pod are no longer running.
I've specifically run into this issue in the context of IPAM:
I'm not sure what the ideal fix is here. I think one option would be to declare pod
tearDown
as idempotent, and whenever the kubelet detects that an infra container should be re-created (was running, is now missing) it calls the network plugintearDown
on the previous container beforesetUp
on the new infra container.@thockin - Any thoughts?
The text was updated successfully, but these errors were encountered: