-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Forces cluster outputs to wait until access entries are complete #3000
fix: Forces cluster outputs to wait until access entries are complete #3000
Conversation
Bloody pre-commit hooks, I hates them so much. Fine, hold on, waiting on a test of the change to finish up (EKS is so slow!)... Edit: Ok, so it was just the pr title lol... |
could you also remove these commands from the example READMEs since they aren't needed anymore (thank goodness)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
## [20.8.5](v20.8.4...v20.8.5) (2024-04-08) ### Bug Fixes * Forces cluster outputs to wait until access entries are complete ([#3000](#3000)) ([e2a39c0](e2a39c0))
This PR is included in version 20.8.5 🎉 |
It seems permissions are not completely propagated when Terraform say "Creation completed", I had to set an explicit dependency on the EKS module to try creating kubernetes resources after the 30s wait time... |
@llavaud I had seen some funny business on apply when using this eks module in conjunction with the eks-blueprints-addons module, and specifically the Module-level |
access entry creation is asynchronous and therefore should be treated as eventually consistent. However, Terraform operates in very much a synchronous manner. Deploying applications via Terraform is not recommended, but the 2nd best alternative is for the providers to implement retries similar to the kubectl provider |
@bryantbiggs Any idea if there is an issue for that on the |
Most likely, and here is the argument https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs#apply_retry_count |
Ahhh, that's what that argument is doing for me (in this config I inherited)! 😁 |
@lorengordon I just create somes resources using the kubernetes provider |
can the kubectl provider apply helm charts? hmm.... |
Here's an issue to follow on retry support in the helm provider... hashicorp/terraform-provider-helm#1058 |
Ok, I think the problem is that the cluster outputs are still becoming available before fargate and node group resources are complete, and providers that deploy resources into those node groups are starting waaay too early. I've got a branch I'm testing that just adds those modules to the same outputs we did in this PR. If that works cleanly a few times, I'll open a PR. Probably get to it Monday... |
Thats fine - Kubernetes handles that. Terraform, however, does not when you have things like
|
That's all well and good and technically correct I suppose. Hard to argue with. However, the eks-blueprints project, and the examples in this project, both really need to discuss how to actually do that then. It feels like there's a lot of configs that are really misleading about how they work and whether they're even a good idea. Or we could just fix it so it actually works. Anyway, while my other hypothesis may be one problem. There is another race condition causing the node group to be created before the ENI Config is ready, which results in this error in the EKS console:
And this error in
That's even with the vpc-cni configured for "before_compute"...
|
If you are using the ENIConfig for secondary CIDR, we just shipped https://aws.amazon.com/about-aws/whats-new/2024/04/amazon-vpc-cni-automatic-subnet-discovery/ which I believe removes the need to set/use the ENIConfig |
I'm reading that article, and the blog post, and trying to understand how to make it actionable in the context of this module, but I think I'm really too green with EKS/k8s to understand what it's saying. I inherited this EKS config, and I'm just trying to make it work more reliably. I know terraform really quite well, and know how to manipulate the DAG to make these implicit dependencies more explicit. But EKS/k8s I do not know at all. :( |
tl;dr - before you might create ENIConfigs like: apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
name: $az_1
spec:
securityGroups:
- $cluster_security_group_id
subnet: $new_subnet_id_1 But now, all you need to do is tag your secondary subnets with: tags = {
"kubernetes.io/role/cni" = 1
} And the VPC CNI will discover these subnets and start utilizing them |
Aha, that was the clue I needed. Ok, thanks @bryantbiggs, sorry I was being so dense! Now I see what is causing the dependency ordering issue. The config I inherited is using the kubectl provider to create that ENIConfig, and the change I was proposing is preventing it from starting until after the node group is created. But the node group config requires that resource before it starts in order to complete successfully. Something like this, then:
I'll see what I can do using the |
Just to be clear, you do not need the ENIconfig if you use the recently released subnet discoverability feature as defined in this and its subsequent links |
Yep understood. It may be a little academic at this point, but since I thought of it now I just want to see if I can influence the timing of the providers and their resources using |
Nifty, yeah this seems to work just fine... For the # Use terraform_data inputs to create resources after compute
provider "kubernetes" {
host = terraform_data.eks_cluster_after_compute.input.cluster_endpoint
cluster_ca_certificate = base64decode(terraform_data.eks_cluster_after_compute.input.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}
# Use terraform_data inputs to create resources after compute
provider "helm" {
kubernetes {
host = terraform_data.eks_cluster_after_compute.input.cluster_endpoint
cluster_ca_certificate = base64decode(terraform_data.eks_cluster_after_compute.input.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}
}
# Use module outputs directly to create resources before compute
provider "kubectl" {
apply_retry_count = 10
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
load_config_file = false
token = data.aws_eks_cluster_auth.this.token
}
# Force a dependency between the EKS cluster auth inputs, and the compute resources
resource "terraform_data" "eks_cluster_after_compute" {
input = {
cluster_endpoint = module.eks.cluster_endpoint
cluster_certificate_authority_data = module.eks.cluster_certificate_authority_data
fargate_profiles = module.eks.fargate_profiles
eks_managed_node_groups = module.eks.eks_managed_node_groups
self_managed_node_groups = module.eks.self_managed_node_groups
}
} And it does actually work to thread tags back through from the module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.8.5"
# ...
eks_managed_node_groups = {
foo = {
# ...
tags = merge(
{ for label, config in kubectl_manifest.eni_config : "eni-config/${label}" => config.id },
local.tags
)
}
}
} Though, for the moment, I chose not to thread the tags through that way because of the potential for recursive references, instead just relying on |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Description
This changeset adds
depends_on
to a few cluster outputs. This will tell terraform that these cluster outputs should not be made "known" and available until all the EKS access entries have completed. The outputs selected in this PR are limited to those used for authenticating to the EKS cluster. If it turns out more are needed, it is easy enough to add them, and I am not aware of any further repercussions to this particular usage ofdepends_on
.Motivation and Context
Without this change, it can be difficult to order the terraform lifecycle on create and destroy when also using the helm, kubectl, and/or kubernetes providers. It requires using
state rm
and-target
to get things to complete successfully. With this change, you can just apply and destroy normally. See also the comment and issue thread here:#2923 (comment)
Breaking Changes
None
How Has This Been Tested?
examples/*
to demonstrate and validate my change(s)examples/*
projectspre-commit run -a
on my pull requestI also ran
terraform apply -auto-approve && terraform destroy -auto-approve
on the "complete" example in the blueprints-addons project, three times successfully, without using any state rm or target steps.