AWS nodes become permanently unreachable after updating aws-auth ConfigMap #1847

iridian-ks · 2021-12-23T20:33:14Z

Hello!

Vote on this issue by adding a 👍 reaction
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)

Issue details

I'm managing k8s worker nodes in a PCI-zone. We manage firewall rules by EC2 tags (Palo Alto). This means, each PCI-zoned workload needs to be scheduled on the right nodes. I've automated provisioning different EKS node groups with Pulumi. Each node group gets it's own dedicated IAM role as part of this.

In EKS, in order for nodes to join the cluster, their IAM role needs to be in the aws-auth ConfigMap. When a new PCI workload comes along I append the list of node groups for the new workload and run Pulumi which does everything perfectly in spinning up that new node group in the EKS cluster.

The problem is part of this is updating aws-auth. Pulumi only supported Deleting and then Creating ConfigMaps in general, which is certainly best practice for pretty much all use-cases and this bug is really an AWS issue, but when Pulumi does this the NodeGroup is permanently stuck as unschedulable and I need to re-create the entire NodeGroup.

What Pulumi could offer is an immutable field in the ConfigMapArgs that defaults to True to keep with the current behavior but allow individual users to decide whether or not to set it to False if they are encountering a use-case like this one.

Steps to reproduce

Create EKS cluster
Create node group with dedicated IAM and aws-auth ConfigMap to allow cluster joining
Create new node group in Pulumi with a new IAM role and update the ConfigMap with both IAM roles

Expected: New and existing node groups would be in the EKS cluster
Actual: Existing node groups can no longer join EKS cluster

The text was updated successfully, but these errors were encountered:

ruckc · 2021-12-23T20:35:57Z

I've seen this when the updated aws-auth map doesn't contain the account creating the EKS cluster.

iridian-ks · 2021-12-23T20:58:52Z

I don't think there's an issue with the ConfigMap. I re-create the NodeGroup after they become broken with a pulumi up --target-replace urn=...NodeGroup..., which will go in and delete all the nodes and re-create them and everything works again.

This tells me that everything is fine except for a moment in time where kubelet can't join? I imagine it's either a kubelet or EKS issue, but it's avoidable if Pulumi doesn't delete the ConfigMap.

Apologies if I'm not fully understanding.

viveklak · 2021-12-28T16:35:25Z

What Pulumi could offer is an immutable field in the ConfigMapArgs that defaults to True to keep with the current behavior but allow individual users to decide whether or not to set it to False if they are encountering a use-case like this one.

It does appear we don't really take the immutable field in the configmap as a hint to override the current replace logic. cc @lblackstone for thoughts here.

lblackstone · 2022-01-04T16:25:04Z

These links are related:
#1568 (comment)
#1775

To summarize, we're planning to use the replaceOnChanges resource option to make the replace behavior user-configurable rather than embedding that logic in the provider. This should give you the flexibility required to make this work.

viveklak · 2022-03-14T21:25:42Z

While pulumi/pulumi#9158 is still open, we are adding a new provider config key in v3.17.0 to treat configmaps as immutable by default.

iridian-ks added the kind/bug Some behavior is incorrect or out of spec label Dec 23, 2021

viveklak added the impact/usability Something that impacts users' ability to use the product easily and intuitively label Dec 28, 2021

This was referenced Mar 9, 2022

Schema should support replaceOnChanges on the resource instead/in addition to property scope pulumi/pulumi#9158

Open

Add option to treat configmaps as mutable #1926

Merged

viveklak closed this as completed in #1926 Mar 14, 2022

pulumi-bot added the resolution/fixed This issue was fixed label Mar 14, 2022

lukehoban added this to the 0.70 milestone Mar 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS nodes become permanently unreachable after updating aws-auth ConfigMap #1847

AWS nodes become permanently unreachable after updating aws-auth ConfigMap #1847

iridian-ks commented Dec 23, 2021

ruckc commented Dec 23, 2021

iridian-ks commented Dec 23, 2021 •

edited

Loading

viveklak commented Dec 28, 2021

lblackstone commented Jan 4, 2022

viveklak commented Mar 14, 2022

AWS nodes become permanently unreachable after updating aws-auth ConfigMap #1847

AWS nodes become permanently unreachable after updating aws-auth ConfigMap #1847

Comments

iridian-ks commented Dec 23, 2021

Hello!

Issue details

Steps to reproduce

ruckc commented Dec 23, 2021

iridian-ks commented Dec 23, 2021 • edited Loading

viveklak commented Dec 28, 2021

lblackstone commented Jan 4, 2022

viveklak commented Mar 14, 2022

iridian-ks commented Dec 23, 2021 •

edited

Loading