Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of EKS terraform config #986

Merged
merged 2 commits into from
Jan 10, 2020

Conversation

aLekSer
Copy link
Collaborator

@aLekSer aLekSer commented Aug 8, 2019

Add module for EKS cluster. Add documentation and example which uses submodules.

Disabled agones-system/agones-ping-udp-service for now as it breaks the helm_agones deployment.
There is a need to add two more Node Pools ( Worker Groups) for metrics and system with taints.

Closes #966 .

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 17d861c1-c44a-4323-bf1f-8f1070a78c8f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.0.0-39802f1

@aLekSer
Copy link
Collaborator Author

aLekSer commented Aug 26, 2019

Today I spend some hours trying to get the config similar to the one created using eksctl, but after small change in configuration terraform apply timeouts on destroy resource step.
What I cannot do with this configuration is to get Elastic IP in kubectl get gs output. Instead I receive internal IPs.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 4ae5f801-15b8-4e63-98eb-6fdc06d59471

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@aLekSer
Copy link
Collaborator Author

aLekSer commented Aug 26, 2019

Fail in E2E test:

Step #20: time="2019-08-26 16:11:35.903" level=info msg="waiting for fleet condition" fleet=simple-fleet-wds5x
Step #20: --- FAIL: TestGameServerSelfAllocate (316.09s)
Step #20: gameserver_test.go:270: Could not get a GameServer ready: waiting for {udp-server [{gameport Dynamic 7654 0 UDP}] {false 0 0 0} {{ 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] } {[] [] [{udp-server gcr.io/agones-images/udp-server:0.14 [] [] [] [] [] {map[cpu:{{30 -3} {<nil>} 30m DecimalSI} memory:{{33554432 0} {<nil>} BinarySI}] map[cpu:{{30 -3} {<nil>} 30m DecimalSI} memory:{{33554432 0} {<nil>} BinarySI}]} [] [] nil nil nil IfNotPresent nil false false false}] <nil> <nil> map[] <nil> false false false <nil> nil [] nil [] [] <nil> nil [] <nil>}}} GameServer instance readiness timed out (): waiting for GameServer to be Ready default/udp-serverl5c7r: timed out waiting for the condition

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: d4fd5a25-e4b1-41ff-8ecb-9faeaf23ac51

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 680715f7-60e8-49a4-9fce-30d26816ef87

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.0.0-7821692

@aLekSer
Copy link
Collaborator Author

aLekSer commented Aug 30, 2019

Experienced an issue with terraform destroy does not succeed. Need to try worker_groups_launch_template as it suggested here:
terraform-aws-modules/terraform-aws-eks#285

@markmandel markmandel added the feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) label Sep 10, 2019
@markmandel markmandel removed the feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) label Sep 17, 2019
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 3433e739-415d-4158-900f-9e5c4458c0ae

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.1.0-635bce0

@aLekSer
Copy link
Collaborator Author

aLekSer commented Oct 10, 2019

I have tested switching from worker_groups to worker_groups_launch_template and now terraform destroy works correctly if we use it without VPC, will update this PR soon.

@aLekSer aLekSer force-pushed the terraform-eks branch 2 times, most recently from e1f6146 to baa5f76 Compare October 11, 2019 10:33
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 72e06b41-8b0c-4ab1-aab1-90d08fcbc75a

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@aLekSer
Copy link
Collaborator Author

aLekSer commented Oct 11, 2019

It seems to be that we met this issue:
hashicorp/terraform-provider-aws#9101

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: ad767902-80bf-46a0-88b0-07d9ba820914

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.1.0-e1f6146

@aLekSer aLekSer marked this pull request as ready for review October 11, 2019 12:03
@aLekSer
Copy link
Collaborator Author

aLekSer commented Oct 11, 2019

There is still a problem with terraform destroy which timeouts when we use both EKS and Helm modules:

module.eks_cluster.module.vpc.aws_internet_gateway.this[0]: Still destroying... [id=igw-07e4294e9af7d91ec, 9m0s elapsed]
module.eks_cluster.module.vpc.aws_internet_gateway.this[0]: Still destroying... [id=igw-07e4294e9af7d91ec, 9m10s elapsed]

I have added a workaround for this issue (in the documents also):

terraform destroy -target module.eks_cluster.module.eks --auto-approve
terraform destroy

I think we should wait for a fix in original Terraform provider repo or propose some way to terraform destroy resources one by one.

Error: Error waiting for internet gateway (igw-07e4294e9af7d91ec) to detach: timeout while waiting for state to become 'detached' (last state: 'detaching', timeout: 15m0s)   

@aLekSer
Copy link
Collaborator Author

aLekSer commented Oct 11, 2019

By the way I have verified AWS Security Group for UDP traffic and it works as expected with nc -u IP port when deployed to EKS with terraform.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 0807e4dd-5022-4b6d-89c5-73f9a1405bb9

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.1.0-e9dc07b

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 8d559665-b7a1-41b8-b93d-1a2de03cafd0

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.1.0-0d9cbbe

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: e12af785-75e5-44c7-b3a5-4b4408df0b22

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.3.0-9c5b0a9

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 4bc8220e-4858-4c8d-a011-38f9a19ee830

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.3.0-4d74996

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 151d9abe-eeaa-458b-b3c5-3e2e72ac4b68

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.3.0-891abf7



{{< alert title="Note" color="info" >}}
Current EKS config does not contain Helm Terraform configuration as for other Cloud Providers. That's because of a known issue with AWS Terraform provider:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Current EKS config does not contain Helm Terraform configuration as for other Cloud Providers. That's because of a known issue with AWS Terraform provider:
Current EKS config does not contain Helm Terraform configuration as for other Cloud Providers. That's because of a known issue with the AWS Terraform provider:

{{< alert title="Note" color="info" >}}
There is an issue with terraform AWS provider:
https://github.com/terraform-providers/terraform-provider-aws/issues/9101
Due to the issue you should remove helm release first, otherwise `terraform destroy` will timeout and never succeed. Remove all created resources manually in that case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe silly question:

If we're directing people to potentially manually install Agones via Helm - but they will still run into the same issue - why don't we direct people to this workaround and install Agones via Helm for this via the Terraform provider, since they will likely have to do the workaround anyway?

Copy link
Collaborator Author

@aLekSer aLekSer Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I was not thinking this way, will check this today. helm delete before terraform destroy already in the docs so.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: c34bf1f6-fa68-4eaf-9986-e9b1cba2ad36

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.3.0-1ab03e3

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jan 9, 2020

Tested that :

terraform apply --auto-approve
aws eks --region us-west-2 update-kubeconfig --name agones-cluster
kubectl config use-context  arn:aws:eks:us-west-2:205003328410:cluster/agones-cluster
helm delete --purge agones && terraform destroy --auto-approve  

is working, while:

 terraform apply --auto-approve  && terraform destroy --auto-approve  

Leads to timeout:

module.helm_agones.helm_release.agones: Still destroying... [id=agones, 6m10s elapsed]
module.eks_cluster.module.vpc.aws_internet_gateway.this[0]: Still destroying... [id=igw-052b23a8873718f88, 6m10s elapsed]                                                                                
module.helm_agones.helm_release.agones: Still destroying... [id=agones, 6m20s elapsed]
module.eks_cluster.module.vpc.aws_internet_gateway.this[0]: Still destroying... [id=igw-052b23a8873718f88, 6m20s elapsed] 
[.. lines. omitted]
Error: Error waiting for internet gateway (igw-052b23a8873718f88) to detach: timeout while waiting for state to become 'detached' (last state: 'detaching', timeout: 15m0s)                              

Error: rpc error: code = Unknown desc = timed out waiting for the condition

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jan 9, 2020

Now more variables are supported and documented:

terraform apply -var  agones_version="1.2.0" -var node_count=2 -var region="us-west-2"  -var cluster_name="agones-cl" --auto-approve
aws eks --region us-west-2 update-kubeconfig --name agones-cl
kubectl config use-context  arn:aws:eks:us-west-2:205003328410:cluster/agones-cl
helm delete --purge agones && terraform destroy --auto-approve   -var  agones_version="1.2.0" -var node_count=2  -var region="us-west-2"  -var cluster_name="agones-cl"

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 32a56923-8e39-4f79-9579-5c42c68f19f1

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jan 9, 2020

New E2E test fail:

Step #21: time="2020-01-09 17:18:01.284" level=info msg="fleet simple-fleet-fgq4b has 7/8 ready replicas"
Step #21: time="2020-01-09 17:18:01.491" level=info msg="fleet simple-fleet-k7phj has 7/8 ready replicas"
Step #21: --- FAIL: TestGameServerReserve (121.45s)
Step #21: gameserver_test.go:494:
Step #21: Error Trace: gameserver_test.go:494
Step #21: Error: Received unexpected error:
Step #21: timed out waiting for the condition
Step #21: waiting for GameServer to be Reserved default/udp-serverqbfln
Step #21: agones.dev/agones/test/e2e/framework.(*Framework).WaitForGameServerState
Step #21: /go/src/agones.dev/agones/test/e2e/framework/framework.go:131
Step #21: agones.dev/agones/test/e2e.TestGameServerReserve
Step #21: /go/src/agones.dev/agones/test/e2e/gameserver_test.go:493
Step #21: testing.tRunner
Step #21: /usr/local/go/src/testing/testing.go:909
Step #21: runtime.goexit
Step #21: /usr/local/go/src/runtime/asm_amd64.s:1357
Step #21: Test: TestGameServerReserve
Step #21: gameserver_test.go:495:
Step #21: Error Trace: gameserver_test.go:495
Step #21: Error: Not equal:
Step #21: expected: "Reserved"
Step #21: actual : "Ready"
Step #21:
Step #21: Diff:
Step #21: --- Expected
Step #21: +++ Actual
Step #21: @@ -1,2 +1,2 @@
Step #21: -(v1.GameServerState) (len=8) "Reserved"
Step #21: +(v1.GameServerState) (len=5) "Ready"
Step #21:
Step #21: Test: TestGameServerReserve

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: c2a6b93c-eac7-40cd-9eec-b0ba25cee3ad

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.3.0-4e46f22

name = "${var.cluster_name}"
}

// TODO(alekser): Fix next Helm submodule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest changes, is this TODO still relevant?

Otherwise, this looks G2G.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will get rid of this TODO, the issue is referenced in the docs.

It provisions all necessary resouces and firewall rules.
Helm provider left in TODO, because adding it makes "terraform destroy" fail.
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: e47266fc-f83d-46fa-b7ae-68722d8b8c08

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.3.0-dc95769

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 936847c7-ea31-429e-9b67-a334350cecff

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/986/head:pr_986 && git checkout pr_986
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.3.0-753f890

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aLekSer, markmandel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@markmandel markmandel merged commit d3b9984 into googleforgames:master Jan 10, 2020
@markmandel markmandel added this to the 1.3.0 milestone Jan 10, 2020
ilkercelikyilmaz pushed a commit to ilkercelikyilmaz/agones that referenced this pull request Oct 23, 2020
It provisions all necessary resouces and firewall rules.
Helm provider left in TODO, because adding it makes "terraform destroy" fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/operations Installation, updating, metrics etc kind/feature New features for Agones lgtm size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terraform support for EKS
5 participants