-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] IAM Service Account creation retries fail because of cloudformation stack status #4981
Comments
When the Cloudformation stack created by eksctl for the creation of an IAM role needed for an IAM service account is in the ROLLBACK_COMPLETE status, eksctl should detect it and try to reconcile the desired config and the actual state of the cluster and IAM resources. Issue eksctl-io#4981
@ndegory Thanks for opening the detailed issue and a follow-up PR ⭐ We will review the issue and PR soon. 👍🏻 |
When the Cloudformation stack created by eksctl for the creation of an IAM role needed for an IAM service account is in the ROLLBACK_COMPLETE status, eksctl should detect it and try to reconcile the desired config and the actual state of the cluster and IAM resources. Issue eksctl-io#4981
When the Cloudformation stack created by eksctl for the creation of an IAM role needed for an IAM service account is in the ROLLBACK_COMPLETE status, eksctl should detect it and try to reconcile the desired config and the actual state of the cluster and IAM resources. Issue eksctl-io#4981
Hi @ndegory. You are correct. |
@Skarlso, fair enough, but right now, under certain conditions, the iamserviceaccount command lets you think the cluster state matches the specifications from the config server YAML file (because no actions, no errors), although some resources are not in the expected state. This action is not atomic, which is problematic for an imperative command. |
@ndegory We decided to pull this into planning and will think of a nice solution that will still leave the command consistent with other commands. :) There was something similar previously that deals with the nature of iamserviceaccount commands here: #4941. It's not similar in the problem but similar in the nature that existing or non-existing resources throws off the create command and leaves things in an inconsistent state. Maybe we can still do something here that will not result in a problematic environment or is more user friendly. We'll discuss this with the team. |
We will reproduce this on our side and help with the PR for adding the validations |
@ndegory Ok, so, the decision is as follows... create will still not be much aware about the circumstances and the infrastructure you run it on. If there were things that you created outside of eksctl that won't really matter for eksctl. Again, it's not a declarative tool. That said! We can certainly improve upon this part:
Mainly this: (the missing pre-requisite). Are you willing to adjust your PR to do a check for this resource to exist and only proceed if yes? :) |
@Skarlso , yes, I can give it a try. I would also like to add one more thing, which was kind of covered by the current PR, which is to react differently when there's an existing Cloudformation stack for that IAM service account, and exit in error when that existing stack is in ROLLBACK_COMPLETE status, instead of the current behavior which ignores it and considers all is well and there's nothing to do. Would that be ok for you? |
@ndegory Sadly, that would be going too far. I mean, that could result in detecting a stack which isn't the stack you want. Or happens to be there because of the same name. If there would have been a stack that had been created during the create and would have failed the create would have failed, right? Or are you saying that the create just happily jugged on even if the stack failed to CREATE_COMPLETE? If there was a stack created separately from |
@Skarlso , correct, a stack created by a previous call to |
Yeah, okay, that is a fair point. I agree to that. Thanks for the explanation! What I was trying to convey is that it should warn the user and not attempt to remedy the situation. Is that okay? |
we're aligned! |
Excellent! :) |
the PR in question is in the draft, moving this ticket to the Blocked column until the PR is ready to review |
What were you trying to accomplish?
Infrastructure provisioning workflow with 2 steps, first Terraform for IaaS resources, and second eksctl for EKS related resources. The Terraform job includes creation of custom IAM policies that are used by the service accounts defined in the EKS cluster config.
When the configuration is not consistent between these two steps, the EKS related job may fail. Fixing it should only require to fix the configuration and run the pipeline again.
What happened?
Creation of IAM resources and Kubernetes service account with the
eksctl create cluster
oreksctl create iamserviceaccount
command fails when pre-requisites are not there (for instance an IAM policy). Fixing the Terraform configuration is enough to let the Terraform job fix the pre-requisites, but when the EKS job runs, it fails to recover.This is caused by:
So far the workaround is to run the
delete iamserviceaccount
command for the service account impacted by the issue, and then run thecreate iamserviceaccount
command again with the cluster config file, but this is not compatible with a declarative approach.How to reproduce it?
Creation of a cluster, with a config file including an IAM service account referring to an IAM policy not yet created (the missing pre-requisite):
Notice the last service account (another-app), the policy has deliberately not been created.
Logs
Versions
I'll proceed with a PR that implements a more reliable workflow.
The text was updated successfully, but these errors were encountered: