-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkerd 2.11.1 controller pods are not running with linkerd-cni option in AKS with Calico #7493
Comments
@ranjith-vatakkeel could you please give this another shot with our latest edge release? We've recently upgraded our CNI libraries dependencies, and that might help. |
@alpeb Thanks for the reply. I will have a check and update you. |
@ranjith-vatakkeel is your cluster configured with a custom cluster domain (i.e. not |
I was able to reproduce the issue using the latest edge, which does indeed only appear with the particular combination of Azure CNI + Calico. Unfortunately, I couldn't retrieve enough information to pinpoint the source of the problem. For the time being, the recommendation under this scenario remains, after installing linkerd-cni, to install linkerd using the flag |
@alpeb I tried to set |
Indeed, I was under the wrong impression that the |
@alpeb Thanks we will wait for that. Just checking, is it fine to remove |
@ranjith-vatakkeel what that hook does is blocking for the proxy to be fully ready before starting the pod's main container. By removing it, the container might start before the proxy is ready and the main container's inbound and outbound connections will fail, at least till the proxy becomes ready. So whether that's fine depends on whether your main containers can tolerate that. |
It turns out the issue isn't related to linkerd's CNI, and more likely a glitch on the Azure CNI + Calico combo. I've opened Azure/AKS#2750 to track it down. |
Hey I have the same issue but on an EKS+Linkerd+Linkerd CNI + AWS CNI + Calico (for network policies) setup. After installing Linkerd-cni the destination and injector deployments won't start and are stuck in the await state (due to the lifecycle spec). Looks like it is not only an AKS problem. |
Thanks for the report @CCOLLOT. Are you able to reproduce the issue with a minimal example such as the one referred to in Azure/AKS/issues/2750? |
Here is what I get: Using apiVersion: v1
kind: Pod
metadata:
name: curl
spec:
containers:
- image: curlimages/curl
name: curl
command: [ "sh", "-c", "--" ]
args: [ "while true; do curl -k https://10.0.0.1; done;" ]
lifecycle:
postStart:
exec:
command: [ "sh", "-c", "--", "while true; do sleep 30; done;" ] The container is stuck in
Using apiVersion: v1
kind: Pod
metadata:
name: othercurl
spec:
containers:
- image: curlimages/curl
name: curl
command: [ "sh", "-c", "--" ]
args: [ "while true; do curl -k https://10.0.0.1; done;" ] The container starts normally.
|
Similar issue is happening with just AWS VPC CNI + Linkerd CNI on AWS EKS (K8s 1.21, Linkerd 2.11.2 stable, VPC CNI 1.10.1) |
It's possible this is related to #8296. I've pushed a proxy image (which will be included in this week's edge release) that can be used for testing: |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
What is the issue?
A fresh installation of linkerd 2.11.1 control plane along with cni plugin is ending with
Container Creating
state.How can it be reproduced?
Steps:
linkerd install-cni | kubectl apply -f -
linkerd install --linkerd-cni-enabled | kubectl apply -f -
You will notice that both
destination
andproxy-injector
pods are stuck onContainerCreating
state.Logs, error output, etc
Since pods are not starting no any logs or error from respective pods. Identity pod was complaining about the reachability to other pods.
output of
linkerd check -o short
$ linkerd check -o short
Linkerd core checks
linkerd-existence
\ No running pods for "linkerd-destination"
Environment
Possible solution
Solution 1:
Reprovision the aks cluster with same config without Calico. All features will be working as expected
Solution 2:
Reinstall the
linkerd
withoutlinkerd-cni
plugin, works like charm.Solution 3:
Remove the lifecycle section from destination and proxy-injector pods spec, pods will get started and seems everything was working. But don't know this is right solution for a PROD environment.
Additional context
Look like AKS calico feature is giving problem to linkerd when we implement with linkerd-cni.
Old linked version 2.10.2 was working fine with Calico and linkerd-cni so seems like its a bug with new version.
Would you like to work on fixing this bug?
no
The text was updated successfully, but these errors were encountered: