-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetworkPolicy not working in EKS #5761
Comments
In my opinion, it would be useful to share at least the network policy manifest. |
That is correct.
It does not matter if I try to connect to the pod directly or via the service. I am always able to connect. |
Your Network Policy seems to be wrong.
|
Hi @vladyslav-mahilevskyi , At which logs would I look to see if Calico even evaluates the config? |
@Furragen I would look at You can also check the calico/node pod on that host for warning / error level logs. |
Hi @caseydavenport , The logs of the calico-pod on the node contain no errors.
This looks to me like a hint that calico at least knows about the policy. |
I was reminded of this issue but if you're not seeing the ipset error logs then that OS is probably using a compatible ipset version. Are you able to share the relevant |
Hi, Here are the (probably relevant) iptables-rules from the node on which the pod runs that should not be reachable:
|
^ These look like the relevant chains to me. You can see that the policy exists and has the correct rules within it here:
One thing that is missing in this output is a measure of which rules are being hit - I usually prefer to look at |
Hi,
Is that helpful for you? |
It seems to suggest that no packets are reaching the default-deny rule, so are probably being accepted / handled earlier in iptables processing. |
Okay, thats what I was suspecting. Why would that be? Also, I stumbled upon logs like this, while researching something else:
This is from the apiserver. |
probably not - that error is probably just the API server's normal timeout for watches, which triggers Calico to restart the watch. Unless you're seeing error logs in the Calico logs that seem correlated, that's probably a red herring. I think the next step here is just to collect all of the relevant diags at once so I can try to comb through for what might be wrong. Right now I'm missing a few things:
I'll also try to find some time to repro this myself, but our automated EKS tests do verify this setup so I expect this might be something specific to your environment. |
Hi, Calico/Node Logs (only from today, if you want all of them, I can post them, but its really a lot): And the output of
Thanks a lot for looking into this! |
Looks like for whatever reason, the Calico to / from workload chains are being skipped entirely. These rules should be matched by any traffic going through any
|
So, So I guess the pods having more than one interface is a problem? |
Potentially - what does your CNI config look like? Are you using multus or something similar to attach pods to multiple networks? If the pod is sending traffic down an interface that enters the host on a veth (or other interface type) that Calico doesn't know about, it won't be able to enforce policy. |
We use vpc-cni, which is installed as a EKS-addon (v1.10.3-eksbuild.1). The config is just the defaults. Calico is deployed via helm (v3.23.1).
Everything else is just the defaults. |
Hi, |
aws-vpc-cni creates pod interfaces starting with Can we see your CNI config file please? i.e. the file on your hosts in I think we need to fix up your CNI config and disable the rogue plugin. |
Might be unrelated, but I've seen reports recently where containerd and CRIO have started shipping with their own CNI plugin (which doesn't work in kubernetes). |
I just wanted to provide you with the configs you asked, but when I connected to one the nodes there was no
I wonder how any rogue cni might have gotten into the cluster though, as I only ever installed vpc-cni and calico, nothing else. Also, thanks for your feedback :) |
That file looks pretty close to the template in the vpc-cni repo: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/misc/10-aws.conflist
So we need to figure out where on your nodes the CNI config is - |
So, I made a mistake when I first looked for the config. |
Hello, I just found this issue while troubleshooting a very similar behavior on our side and thought sharing might help. And when we tried to play with a basic namespace deny-all policy in order to validate that calico was still working, we had troubles :) I checked
I checked the pod interfaces as well and there is only 1 interface attached to the pod which ends with an interface called I also checked the If there is tests you want me to perform, let me know :) |
Any news on this? |
Hey, @Linutux42 did you maybe find a solution or workaround? |
Hello, I'm having the same issue using EKS version 1.22 with amazon optimized AMI and VPC CNI + Calico plugin. |
@Furragen I haven't had time to investigate more for now, sorry :/ |
Hi, same problem here. I'm following the stars policy demo instructions, and when I get to adding the default-deny policy, it has no effect. My environment is:
|
Hello, I upgraded my EKS clusters to 1.22 a few days ago and I found a few minutes to test again the migration to the official tigera-operator helm chart v3.24.1. @Furragen Try with v3.24.1 and see if it works ¯\(ツ)/¯ |
any update on this ? I am stuck with the same issue and not able to enforce NetworkPolicy. It works sometimes but not all the times when I reapply. |
Greetings, We are having the same issue but only with one cluster out of two:
|
Closing as stale, pls reopen if you are still seeing these issues. |
EKS 1.27, VPC-CNI, Calico 3.26.1. Behavior seems to be quite random. Some policies work as expected. Some services just timeout without any policies and others work fine. |
I am seeing this with EKS 1.24 and VPC-CNI with Calico 3.26.1... Exactly as @aadamovich describes some times things are blocked sometimes they are not.. results are very un-predicitable. |
Official documentation now states:
@tomastigera can you please re-open the issue, at the present this is the only one I can find tracking IPv6 netpols with the EKS provider and it definitely doesn't work, even with |
In my case I am not using IPv6 followed Calico/AWS EKS doc to the T and nothing works... The two things that I wonder if its affecting it is use of secondary subnet 100.x.x.x subnets with PODS
I also see the following errors in calico-node logs: 2023-07-28 17:13:11.977 [INFO][281] felix/int_dataplane.go 1836: Received proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"default/mycurlpod" endpoint_id:"eth0" > endpoint:<state:"active" name:"calie49fb8bfb10" profile_ids:"kns.default" profile_ids:"ksa.default.default" ipv4_nets:"100.64.185.16/32" tiers:<name:"default" ingress_policies:"default.deny-app-policy" ingress_policies:"default/default.default-deny" egress_policies:"default.deny-app-policy" egress_policies:"default/default.default-deny" > > Global Policy: [root@ip-10-184-9-79 ~]# kubectl get globalnetworkpolicy -o yaml
|
Does anyone know if this is the proper doc to follow to enable K8S Policies using Calico with EKS as AWS doesn't seem to reference purging aws-node daemonset. https://docs.aws.amazon.com/eks/latest/userguide/calico.html - doesn't reference deleting "aws-node" daemonset Where as the Calico Doc says purge "aws-node" - https://docs.tigera.io/calico/latest/getting-started/kubernetes/managed-public-cloud/eks#install-eks-with-amazon-vpc-networking Once aws-node is purged and the rest of the instructions are followed here NetworkPolicies work... Since this cluster will use Calico for networking, you must delete the aws-node daemon set to disable AWS VPC networking for pods. kubectl delete daemonset -n kube-system aws-node |
I think there might be some confusion. One is combining Calico with AWS-VPC-CNI (which is installed by aws-node). Calico with VPC-CNI installation instructions are here: https://docs.aws.amazon.com/eks/latest/userguide/calico.html Calico with Calico-cni installation instructions are here: https://docs.tigera.io/calico/latest/getting-started/kubernetes/managed-public-cloud/eks#install-eks-with-calico-networking Both methods should "just work". Both methods have pros and cons (see the doc). |
Hi We are facing similar issues. sometimes connection was successful and sometimes it failed.. any response much appreciated |
I have a EKS-cluster in which I deployed calico for networkpolicy enforcement.
To test if everything works, I created a test namespace with a pod in it and created a networkpolicy, that would deny all ingress-traffic.
If I now try to reach this pod from another pod in another namespace, it just works.
Expected Behavior
I expected the pod to not be reachable.
Current Behavior
The pod is reachable.
Possible Solution
Steps to Reproduce (for bugs)
Context
I want to isolate workloads in different namespace from eachother, which is not possible now.
Your Environment
The text was updated successfully, but these errors were encountered: