-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot Create a New Cluster w/ 17.21.0 #1635
Comments
I am investigating this |
@bengesoff @hfuss I checked and seems issue is occuring not during cluster creation but rather during Could you confirm? |
@stevehipwell this is related to data source eks_cluster which we introduced in your PR, this is used just to fetch terraform-aws-eks/modules/node_groups/launch_template.tf Lines 17 to 18 in 306ad72
now I am looking into bootstrap.sh from AWS and seems those parameters are optional and can be fetched itself (if not setup) https://github.com/awslabs/amazon-eks-ami/blob/dbba9499841d3936d285bd2427f90ef0cdd385b3/files/bootstrap.sh#L20-L21 what now I thinking is just removing a data source and those two parameters from template. Does there was any special reason why this should be fetched by terraform instead |
Created draft PR to remove code causing issues, but this will require testing |
@daroga0002 Sounds like you're on it, but just to follow up - that's not what I am observing. I am starting with an empty Terraform state and attempting to create a new VPC and EKS cluster. Before it even gets started with a The patch I opened in #1636 allowed me to create a brand new cluster, so your suggestion of possibly removing the |
@hfuss #1636 only works because you're not using the functionality this enables and aren't setting @daroga0002 #1638 will only work for EKS optimised AMIs with bootstrap.sh and not for fully customised AMIs. |
@daroga0002 I've created #1639 which should fix the issue and be a valid implementation of the pattern added in #1580. |
I thinked about this now again and in general currently we require CA and endpoint only when we using EKS optimized ami:
but as EKS optimized ami has From second side when user will want to use non eks optimized ami then he can always pass this data thru One more thing which I see is probably second bug in template or maybe I dont see use case but
bootstrap.sh (so this is EKS optimized) and below we have
|
@daroga0002 I'll try and answer your points below. It wouldn't be easy, if even possible, to pass the dynamic endpoint and CA into userdata for a non EKS optimised AMIs, which is why I added them as variables into the userdata template to be consumed by the custom userdata embedded below. You are correct that bootstrap.sh on an EKS optimised AMI can download these values itself, but I would rather they were passed in as happens by default to limit the moving parts and so nothing that can be generated in advance is left until runtime and done on each node. This is what we do for the other worker userdata so is consistent (MNGs do this too behind the scenes). As far as the template goes; the first block either modifies bootstrap.sh if we're getting a merged userdata because we didn't pass in an AMI ID, or it sets the variables without needing the modify bootstrap.sh to source them from another file. Then we add the custom userdata. Finally we call bootstrap.sh if we have an EKS optimised AMI that isn't going to merge the userdata. If you have a non-EKS optimised AMI your custom userdata needs to replicate bootstrap.sh or your node wont join the cluster. |
To clarify the basic principals of the changes in #1580. Firstly given 2 MNGs with the only difference being that one set an AMI ID, the resulting userdata would be functionally the same; so if the passed in AMI ID is the current default then you get the same outcome as if you hadn't passed it in. Secondly if it's not an EKS optimised AMI the required variables should be available to the custom bootstrap script. With this pattern I could take the current AWS EKS AMI ID and pass it into the module while setting it as not EKS optimised and then put a modified version of bootstrap.sh into the custom userdata to fix bugs such as awslabs/amazon-eks-ami#782 (at the cost of needing to manually update the AMI ID). |
@antonbabenko seems this is some bug which was not catched during testing as tests were done from scratch. This is isuse which is happenig during refreshing phase of terraform. I am thinking now with @stevehipwell in what way we can address this problem in best way. QUestion is does we should mark somehow GitHub release as buggy in Changelog? |
@daroga0002 off the top of my head the following solutions to the buggy release are possible but not necessarily the only ones.
|
@daroga0002 I am not sure what is the problem in detail because I was not following this discussion. We need to release a bug fix as a minor release (17.22.0) and no need to highlight/mark something as buggy. |
Hi we are also now in a predicament whereby we are unable to deploy new clusters .. typically the update happened during some of our upgrades and testing. Do you have an idea when we could expect this to be fixed? Many thanks. |
@cmsmith7 can you not just use the previous version? |
17.21.0 effectively blocks creating new clusters with managed node groups because v17.20.0 of the eks module is not blocking new cluster creation and #1639 looks like a proper fix to me. |
FYI, as a beginner to intermediate TF user I spent ~5 hours today trying to debug this issue. I, mistakenly, did not specify a version number when pulling this module (and verified that rolling back to 17.20.0 works perfectly). Additionally, this issue page hasn't been indexed yet on Google so searching for the output error will get you meek results (some referring to githubmemory.com which eventually got me here). Just thought I'd add my two cents so we can make this PR a priority... |
@daroga0002 @stevehipwell Do you guys know when the fix will be ready? |
@antonbabenko I think @daroga0002 is just waiting on your review #1639 (review). |
@stevehipwell I have just merged it and released v17.22.0. During the night I received even more emails from GitHub and didn't have a chance to read the one where Dawid mentioned me. Thank you very much for your contribution! |
@antonbabenko this one was on me so I'm glad I could help get it fixed. I'm still slightly unclear as to why using the data source failed in the way it did, but in hindsight the current solution was the one that I should have stuck with as it's significantly simpler to pass variables rather than re-fetch them. |
As per me this is some terraform bug with dependencies during terraform-aws-eks/modules/node_groups/main.tf Line 105 in 6b3a8e6
but it was still failing |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Description
The module fails to create new clusters bc its trying to lookup an EKS cluster that does not exist by default.
Before you submit an issue, please perform the following first:
.terraform
directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/
terraform init
Versions
Reproduction
Steps to reproduce the behavior:
Attempt to create an EKS cluster w/ node groups. For reference I'm using region
ap-southeast-2
.Code Snippet to Reproduce
Expected behavior
I can use this module to create an EKS cluster w/ node groups
Actual behavior
It's failing to create a cluster w/ node groups bc it cannot find a default EKS cluster.
Terminal Output Screenshot(s)
Additional context
see #1633 (comment)
The text was updated successfully, but these errors were encountered: