AWS nodes become permanently unreachable after updating aws-auth ConfigMap #1847
Labels
impact/usability
Something that impacts users' ability to use the product easily and intuitively
kind/bug
Some behavior is incorrect or out of spec
resolution/fixed
This issue was fixed
Milestone
Hello!
Issue details
I'm managing k8s worker nodes in a PCI-zone. We manage firewall rules by EC2 tags (Palo Alto). This means, each PCI-zoned workload needs to be scheduled on the right nodes. I've automated provisioning different EKS node groups with Pulumi. Each node group gets it's own dedicated IAM role as part of this.
In EKS, in order for nodes to join the cluster, their IAM role needs to be in the aws-auth ConfigMap. When a new PCI workload comes along I append the list of node groups for the new workload and run Pulumi which does everything perfectly in spinning up that new node group in the EKS cluster.
The problem is part of this is updating aws-auth. Pulumi only supported Deleting and then Creating ConfigMaps in general, which is certainly best practice for pretty much all use-cases and this bug is really an AWS issue, but when Pulumi does this the NodeGroup is permanently stuck as unschedulable and I need to re-create the entire NodeGroup.
What Pulumi could offer is an
immutable
field in the ConfigMapArgs that defaults to True to keep with the current behavior but allow individual users to decide whether or not to set it to False if they are encountering a use-case like this one.Steps to reproduce
Expected: New and existing node groups would be in the EKS cluster
Actual: Existing node groups can no longer join EKS cluster
The text was updated successfully, but these errors were encountered: