-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multus cni ignores errors raised on CNI DEL #1239
Comments
Mostly -- I think during cmdDel we want to ignore errors so that the pod can be deleted by the API. Otherwise, we retry on delete, and then the pod can hang around and crashloop. So, with CNI mentality in mind, we're very sensitive about successes on ADD but on DEL, we're very lenient with letting CNI DELs fail so that pods aren't stuck in a crashloop |
But it will lead to resource leak(such as ip leak) if cmdDel failed. Kubelet could retry to delete sandbox if it knows |
The CNI spec reads that: https://github.com/containernetworking/cni/blob/main/SPEC.md#del-remove-container-from-network-or-un-apply-modifications
So while you have a point that sometimes there will be resources left behind, this is the suggested behavior for CNI plugins on a CNI DEL |
If we don't use multus, pod will stuck after |
I think there should be a field in multus' CNI config to describe whether to tolerate errors returned by CNI Del. Giving control of the process to the user helps improve the usability of multus |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
What happend:
In #1084, this pr wil ignore the common errors raised by CNI. I don't think this is what we expected.
For example, if a custom cni cmdDel return an error, multus-shim will aslo only log this error but ignore it. So containerd believe it's a successful deletion while we can see TearDown network for sandbox xxx successfully even if it failed to do cmdDel actually.
Here are some containerd logs:
What you expected to happen:
multus should wrap the error raised by CNI, so kubelet could know that to prevent the pod to be deleted.
How to reproduce it (as minimally and precisely as possible):
write a fake CNI to mock cmdDel always return an error.
Anything else we need to know?:
Environment:
image path and image ID (from 'docker images')
kubectl version
):kubectl get net-attach-def -o yaml
)kubectl get pod <podname> -o yaml
)The text was updated successfully, but these errors were encountered: