Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS tries to create system processes (kube-system namespace) incorrectly on Admiralty virtual nodes #158

Open
matt-slalom opened this issue Jan 10, 2023 · 4 comments

Comments

@matt-slalom
Copy link

matt-slalom commented Jan 10, 2023

Scenario

  • AWS EKS cluster (K8s 1.24) using spot instance nodes
  • Admiralty 0.15.1

Problem Description

Certain pods get stuck in "Terminating" status, apparently because K8s is trying to schedule processes on the Admiralty virtual nodes. Not sure if this is connected to the use of spot instances (where nodes disappear). If it is connected, I'd expect this would also affect autoscaling groups (unconfirmed).

Issue

Admiralty appears to be confusing EKS. EKS is trying to schedule pods like kube-proxy, ebs-csi-node, and aws-node on Admiralty virtual nodes and failing.

Is there a taint we should be putting on the EKS-supplied nodes?

Observations

Representative (truncated output) pod states after nodes disappear:

kubectl get pods -n kube-system                                     
NAME                                 READY   STATUS        RESTARTS   AGE
kube-system    aws-node-76rwv                                                    0/1     Terminating   0          4d3h
kube-system    aws-node-lbsjj                                                    1/1     Running       0          2d23h
kube-system    aws-node-mlvh4                                                    1/1     Running       0          2d22h
kube-system    aws-node-t8nb2                                                    0/1     Terminating   0          4d3h
kube-system    coredns-799c5565b4-6446n                                          1/1     Running       0          2d22h
kube-system    coredns-799c5565b4-gltcn                                          1/1     Running       0          2d23h
kube-system    ebs-csi-controller-b5d8854df-885vr                                6/6     Running       0          2d22h
kube-system    ebs-csi-controller-b5d8854df-zhsrx                                6/6     Running       0          2d23h
kube-system    ebs-csi-node-g7kv9                                                3/3     Running       0          2d22h
kube-system    ebs-csi-node-jhj6l                                                0/3     Terminating   0          4d3h
kube-system    ebs-csi-node-rpsjr                                                3/3     Running       0          2d23h
kube-system    ebs-csi-node-rqs8r                                                0/3     Terminating   0          4d3h
kube-system    kube-proxy-d6hdx                                                  0/1     Terminating   0          4d3h
kube-system    kube-proxy-hr6k7                                                  0/1     Terminating   0          4d3h
kube-system    kube-proxy-l26nj                                                  1/1     Running       0          2d22h
kube-system    kube-proxy-rbqh8                                                  1/1     Running       0          2d23h

Investigate one of the stuck pods (note 10250 is the port used by Admiralty) The x.x.x.114 address no longer exists in the cluster, so I'm guessing it previously belonged to a pod on a node that no longer exists (spot instance).

kubectl logs aws-node-76rwv -n kube-system     
Defaulted container "aws-node" out of: aws-node, aws-vpc-cni-init (init)
Error from server: Get "https://172.16.2.114:10250/containerLogs/kube-system/aws-node-76rwv/aws-node": dial tcp 172.16.2.114:10250: connect: connection refused

Force kill the pod to clean up, and K8s spawns a new one, but it stays stuck in Pending:

kubectl delete pod --force --grace-period=0 -n kube-system aws-node-76rwv                                                                                       
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "aws-node-76rwv" force deleted

# wait a while

kubectl get pods -n kube-system aws-node-jcrqf
NAME             READY   STATUS    RESTARTS   AGE
aws-node-jcrqf   0/1     Pending   0          39m

K8s is trying to run start the pod on a virtual node for some reason:

kubectl -n kube-system describe pod aws-node-jcrqf |tail -4
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  40m   default-scheduler  Successfully assigned kube-system/aws-node-jcrqf to admiralty-default-app-gke-cluster-1998d80ea2

Other pods are running on EKS nodes. The "pending" and "terminating" pods are scheduled on Admiralty virtual nodes.

kubectl -n kube-system get pods -n kube-system -o wide                                   
NAME                                 READY   STATUS        RESTARTS   AGE     IP             NODE                                                    NOMINATED NODE   READINESS GATES
aws-node-jcrqf                       0/1     Pending       0          41m     <none>         admiralty-default-app-gke-cluster-1998d80ea2            <none>           <none>
aws-node-lbsjj                       1/1     Running       0          3d      172.16.2.182   ip-172-16-2-182.us-west-2.compute.internal              <none>           <none>
aws-node-mlvh4                       1/1     Running       0          2d23h   172.16.2.226   ip-172-16-2-226.us-west-2.compute.internal              <none>           <none>
aws-node-t8nb2                       0/1     Terminating   0          4d4h    <none>         admiralty-default-multi-cloud-test-cluster-c962bad2df   <none>           <none>

Double check that the kube-system namespace does not have the Admiralty label:

kubectl get ns kube-system --show-labels
NAME          STATUS   AGE   LABELS
kube-system   Active   11d   kubernetes.io/metadata.name=kube-system
@matt-slalom
Copy link
Author

It looks like the virtual nodes have taints, but that the aws-node daemonset is ignoring it.

kubectl describe node admiralty-default-multi-cloud-test-cluster-c962bad2df | grep Taints
virtual-kubelet.io/provider=admiralty:NoSchedule

@matt-slalom
Copy link
Author

matt-slalom commented Jan 10, 2023

I added a label to the EKS nodes and patched the aws-node daemonset to use a node selector, but only partial success. EKS no longer tries to deploy to the remote cluster's virtual node, but unfortunately the Admiralty virtual node for the local cluster also picked up the label, so aws-node is still trying to deploy to the Admiralty node.

@adrienjt
Copy link
Contributor

You can exclude labels from being picked up, cf. #115.

@matt-slalom
Copy link
Author

matt-slalom commented Jan 16, 2023

Thanks for the recommendation @adrienjt. I should point out that my attempts to tweak aws-node are really kind of hacky since aws-node is supplied by EKS and isn't something I control (though I can obviously make changes to it).

Maybe I'm misunderstanding, but I think this conflict between Admiralty and EKS is something Admiralty needs to address, even if it's in documentation. Altering the aws-node daemonset might ultimately be part of the solution, but it seems to me it should be part of the Admiralty install.

Or am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants