Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ASM module #1140

Merged

Conversation

Monkeyanator
Copy link
Contributor

@Monkeyanator Monkeyanator commented Feb 2, 2022

The existing ASM module is a near-1:1 wrapper around install_asm which is not ideal. This rewritten module uses the ControlPlaneRevision KRM API to provision the control plane. We still break out and use a provisioner to apply the CPR (due to limitations around the kubernetes_manifest resource) but it's much slimmer than before, and doesn't involve curling down an installation script.

@comment-bot-dev
Copy link

comment-bot-dev commented Feb 2, 2022

Thanks for the PR! 🚀
✅ Lint checks have passed.

@Monkeyanator Monkeyanator marked this pull request as ready for review February 10, 2022 23:40
examples/simple_zonal_with_asm/hub.tf Outdated Show resolved Hide resolved
modules/asm/main.tf Outdated Show resolved Hide resolved
modules/asm/main.tf Outdated Show resolved Hide resolved
modules/asm/main.tf Outdated Show resolved Hide resolved
modules/asm/main.tf Outdated Show resolved Hide resolved
modules/asm/main.tf Outdated Show resolved Hide resolved
modules/asm/scripts/create_cpr.sh Outdated Show resolved Hide resolved
modules/asm/scripts/create_cpr.sh Outdated Show resolved Hide resolved
@Monkeyanator
Copy link
Contributor Author

Addressed comments. CI failing on an issue that doesn't seem related to these changes though, any thoughts here? @bharathkkb

Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":        Error: No value for required variable
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":          on variables.tf line 17:
Step #27 - "converge simple-zonal-private-local":          17: variable "project_ids" {
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":        The root module input variable "project_ids" is not set, and has no default
Step #27 - "converge simple-zonal-private-local":        value. Use a -var or -var-file command line argument to provide a value for
Step #27 - "converge simple-zonal-private-local":        this variable.
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":        Error: No value for required variable
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":          on variables.tf line 33:
Step #27 - "converge simple-zonal-private-local":          33: variable "compute_engine_service_accounts" {
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":        The root module input variable "compute_engine_service_accounts" is not set,
Step #27 - "converge simple-zonal-private-local":        and has no default value. Use a -var or -var-file command line argument to
Step #27 - "converge simple-zonal-private-local":        provide a value for this variable.
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":        Error: No value for required variable
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":          on variables.tf line 38:
Step #27 - "converge simple-zonal-private-local":          38: variable "registry_project_ids" {
Step #27 - "converge simple-zonal-private-local":        
Step #27 - "converge simple-zonal-private-local":        The root module input variable "registry_project_ids" is not set, and has no
Step #27 - "converge simple-zonal-private-local":        default value. Use a -var or -var-file command line argument to provide a
Step #27 - "converge simple-zonal-private-local":        value for this variable.
Step #27 - "converge simple-zonal-private-local": >>>>>> Running the command `terraform apply -auto-approve -lock=true -lock-timeout=0s -input=false -no-color -parallelism=10 -refresh=true  ` failed due to a non-zero exit code of 1.
Step #27 - "converge simple-zonal-private-local": >>>>>> ------Exception-------
Step #27 - "converge simple-zonal-private-local": >>>>>> Class: Kitchen::ActionFailed
Step #27 - "converge simple-zonal-private-local": >>>>>> Message: 1 actions failed.
Step #27 - "converge simple-zonal-private-local": >>>>>>     Converge failed on instance <simple-zonal-private-local>.  Please see .kitchen/logs/simple-zonal-private-local.log for more details
Step #27 - "converge simple-zonal-private-local": >>>>>> ----------------------
Step #27 - "converge simple-zonal-private-local": >>>>>> Please see .kitchen/logs/kitchen.log for more details
Step #27 - "converge simple-zonal-private-local": >>>>>> Also try running `kitchen diagnose --all` for configuration

@bharathkkb
Copy link
Member

bharathkkb commented Feb 24, 2022

@Monkeyanator I noticed it when working on another PR. You can switch to using the registry source for gke-project-asm in test/setup https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/99e8f9b88eff8363eb7a06faff79e460c049ac35/test/setup/main.tf

@bharathkkb
Copy link
Member

Opened #1159 since it was affecting a few PRs

@Monkeyanator
Copy link
Contributor Author

Thanks for looking into it @bharathkkb, made the change from #1159 here but seems like we still run into issues about resources changing (hard to pin down where the step actually fails from the logs). Think the PR with just the fix is failing CI as well

@bharathkkb
Copy link
Member

@Monkeyanator seemed like the failure in #1159 was a flake. I have kicked it off again.

Also could we add some kind of upgrade guide for users coming from the old module? Is there a possible upgrade path minimizing disruption?

@Monkeyanator
Copy link
Contributor Author

@bharathkkb the exposed options and implementation are different enough from the previous implementation where it'll be tough to support a migration path here. Is it possible to document to pin to the old version for this first iteration? Updated the README documenting this.

modules/asm/README.md Outdated Show resolved Hide resolved
modules/asm/main.tf Outdated Show resolved Hide resolved
modules/asm/scripts/create_cpr.sh Outdated Show resolved Hide resolved
modules/asm/README.md Show resolved Hide resolved
@bharathkkb
Copy link
Member

the exposed options and implementation are different enough from the previous implementation where it'll be tough to support a migration path here.

Makes sense, I dropped a comment above to move that section to the upgrade guide.

@kaariger kaariger merged commit 0d9c44e into terraform-google-modules:master Mar 3, 2022
@gottfrois
Copy link

gottfrois commented Mar 3, 2022

Super excited that this PR has been merged. Do you think it will solve this issue? #1114

@Monkeyanator
Copy link
Contributor Author

Super excited that this PR has been merged. Do you think it will solve this issue? #1114

Yes, the segfault probably comes from somewhere in the install_asm script which we completely ditched in the new module

@elebioda
Copy link

Why was asm_version removed?

CPL-markus pushed a commit to WALTER-GROUP/terraform-google-kubernetes-engine that referenced this pull request Jul 15, 2024
* Remove previous ASM module and add initial implementation of new

* Move from kubernetes_manifest to bash CPR creation

* Add meshconfig.googleapis.com service

* Add namespace to CPR creation script wait

* Change example to use google_container_cluster

* Improve CPR status wait duration

* Use retries rather than sleeping to wait for CPR CRD existence

* Enable servicemesh feature in module

* Revert changes to examples

* Fix enable_mdp to just enable CNI

* Lint

* minor fixes

* Bump timeout on status wait

* Don't create MeshConfig if unset

* Fix ASM sample

* Update README autogen

* Minor fixes

* Remove meshConfig from module

* fix end to end tests

* lint fixes

* Remove enable_mdp, add enable_vpc_sc and fleet_id

* move VPC-SC from labels to annotations

* update README

* use default node pool size

* use wip for exmaple cluster creation

* add feature enablement back to module

* lint

* remove feature enablement from module

* remove depends_on

* fix unspecified channel bug

* minor fixes

* add more testing

* update docs

* change required module version

* fix cclb

* change registry source to run in CI

* update CROSS_CLUSTER_SERVICE_DISCOVERY to multicluster_mode

* update README

* fix test

* fix wording

* remove from README

* iterate on comments

Co-authored-by: kaariger <23217852+kaariger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants