First scheduled pod starts using the wrong CNI #219

2ZZ · 2021-01-28T12:29:04Z

Hi,

The first pods scheduled on a node can sometimes get an IP from the non-default CNI.
I think this is because CNI-Genie daemonset pod can start too late maybe due to delays in pulling the image.
I added the system-node-critical priorityClass to the CNI-Genie daemonset but it has not helped.

Example timeline:

Nginx app pod triggers cluster scale up
AWS-CNI, Calico and CNI-Genie daemonsets are scheduled on the new node
Nginx pod starts up before GNI-Genie pod has finished starting so the config is not in /etc/cni/net.d at this point
Nginx pod gets an IP from AWS-CNI instead of the default set in CNI-Genie
Future pods on that new node are given correct IPs once CNI-Genie has started

Setup:
Cluster: Amazon EKS 1.18
Calico version: 3.16.3
CNI-Genie version: latest

shinebayar-g · 2021-03-16T15:35:29Z

That sounds possible scenario. We're using CNI-Genie for running Cilium + AWS VPC.

Fortunately we didn't observe this issue when upgrading worker versions.

2ZZ · 2021-03-18T18:40:21Z

Hi, I modified AWS CNI to wait for CNI-Genie to be present in /etc/cni and haven't seen this issue since.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First scheduled pod starts using the wrong CNI #219

First scheduled pod starts using the wrong CNI #219

2ZZ commented Jan 28, 2021

shinebayar-g commented Mar 16, 2021

2ZZ commented Mar 18, 2021

First scheduled pod starts using the wrong CNI #219

First scheduled pod starts using the wrong CNI #219

Comments

2ZZ commented Jan 28, 2021

shinebayar-g commented Mar 16, 2021

2ZZ commented Mar 18, 2021