Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout on wait_for_cluster is breaking our terraform code #1395

Closed
debarshibasak opened this issue May 25, 2021 · 11 comments · Fixed by #1420
Closed

Timeout on wait_for_cluster is breaking our terraform code #1395

debarshibasak opened this issue May 25, 2021 · 11 comments · Fixed by #1420

Comments

@debarshibasak
Copy link

debarshibasak commented May 25, 2021

Description

The PR #1359 is breaking out installation as it takes more than 5 minutes for the kubernetes api service to be provisioned.

Either this number should be considerably large or should be configurable.

I am not sure if this number can be configure. If yes, it is not well documented. Could some help us out please?

Versions

  • Terraform: v0.14.4
  • Module: eks

Code Snippet to Reproduce


 connect: connection timed out

on .terraform/modules/eks/cluster.tf line 68, in data "http" "wait_for_cluster":
68: data "http" "wait_for_cluster" {

The endpoint that wait_for_cluster is trying to wget is accessible outside cluster. I did wget and verify that.

Expected behavior

cluster gets provisioned

Actual behavior

cluster does not get provisioned but kubeconfig is acquired

@debarshibasak debarshibasak changed the title Timeout on wait_for_cluster is breaking out terraform code Timeout on wait_for_cluster is breaking our terraform code May 25, 2021
@barryib
Copy link
Member

barryib commented May 28, 2021

Thanks @debarshibasak for opening this issue.

Are you deploying a private (with private endpoint) EKS cluster ? Is timeout happens once during your apply ? I mean, Is your second apply perform well ?

I'm trying to figure out if there is a network issue.

@debarshibasak
Copy link
Author

Hey @barryib Thanks for your question. We are deploying public cluster. The endpoint is reachable later and return 200, but it takes quite some time.

It takes more than 500 seconds. I noticed it has hardcoded parameter in wait_for_cluster. I think it would be nice if that is customizable.

Yes second apply works.

@barryib
Copy link
Member

barryib commented May 28, 2021

5mn was the previous setting. So far, it was working well.

That means, EKS cluster is created, but it's not ready to use in less than 5mn ? Is something else going wrong ? Any thoughts ?

As for the timeout variable, yes we can add it. Please open a PR.

@debarshibasak
Copy link
Author

Terraform ended with timeout error on "/healthz" , thats why I created this ticket. Let me try to reproduct this a few more times. It might just be flaky issue.

@SNA-rh
Copy link
Contributor

SNA-rh commented Jun 1, 2021

Hey, we have noticed this behavior as well. Our TAM/AWS support suggested waiting a longer period of time before we attempt to configure our aws-auth configmap.

I've attempted to fix this in #1420. Please let me know if this is acceptable.

@daroga0002
Copy link
Contributor

which region you trying to create EKS?

@SNA-rh
Copy link
Contributor

SNA-rh commented Jun 1, 2021

Not sure of OP, but we're seeing this in us-east-2. We've also seen it in us-east-1.

@SNA-rh
Copy link
Contributor

SNA-rh commented Jun 9, 2021

@barryib Is there an ETA on when this will be in a released version? We're still seeing timeouts while building our clusters and would like to increase it as soon as we can.

@barryib
Copy link
Member

barryib commented Jun 9, 2021

Just released it in v17.1.0

@lrstanley
Copy link

lrstanley commented Jul 30, 2021

Seems like I am still running into this issue (even with the cluster timeout specified). See the following:

++ terraform -chdir=/tmp/build/4d47ec0b/plan-artifacts apply tfplan
[...]
module.eks.aws_eks_cluster.this[0]: Creating...
module.eks.aws_eks_cluster.this[0]: Still creating... [10s elapsed]
[...]
module.eks.aws_eks_cluster.this[0]: Still creating... [11m10s elapsed]
module.eks.aws_eks_cluster.this[0]: Still creating... [11m20s elapsed]
module.eks.aws_eks_cluster.this[0]: Creation complete after 11m26s [id=cloud-tooling]
module.eks.data.http.wait_for_cluster[0]: Reading...
module.eks.data.http.wait_for_cluster[0]: Still reading... [10s elapsed]
module.eks.data.http.wait_for_cluster[0]: Still reading... [20s elapsed]
module.eks.data.http.wait_for_cluster[0]: Still reading... [30s elapsed]
module.eks.data.http.wait_for_cluster[0]: Still reading... [40s elapsed]
module.eks.data.http.wait_for_cluster[0]: Still reading... [50s elapsed]
╷
│ Error: Error making request: Get "https://[...].gr7.us-east-1.eks.amazonaws.com/healthz": read tcp [...]:50656->[...]:443: read: connection reset by peer
│ 
│   with module.eks.data.http.wait_for_cluster[0],
│   on .terraform/modules/eks/data.tf line 89, in data "http" "wait_for_cluster":
│   89: data "http" "wait_for_cluster" {
│ 
╵

I have wait_for_cluster_timeout = 1200, and I'm running v17.1.0. It fairly consistently will fail while waiting for the creation (and doesn't look like it actually waits the time I've specified, i.e. barely makes it 1m15s or so based off build timestamps). A re-run immediately after will pass, however.

Edit: if it's important, Terraform v1.0.0

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants