Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Require Google provider 4.0.0 #1071

Conversation

jackwhelpton
Copy link
Contributor

No description provided.

@comment-bot-dev
Copy link

comment-bot-dev commented Nov 22, 2021

Thanks for the PR! 🚀
✅ Lint checks have passed.

@jackwhelpton
Copy link
Contributor Author

jackwhelpton commented Nov 22, 2021

This is a bit fiddly, so I'm not surprised to see there are some failures, although at least the validation succeeds now. Can somebody (@bharathkkb?) paste the Cloud Build logs for the int trigger failure here?

It'd be really handy if we could have comment-bot-dev relay those as well as the lint failures.

Hmm, on the subject of comment-bot-dev: even though linting passed some of those errors still look pertinent. I'll take a look at them next.

Copy link
Member

@bharathkkb bharathkkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackwhelpton Thanks for working on this!
We will need to update https://github.com/terraform-google-modules/terraform-google-gcloud for 4.0 which is used by some modules here

* addresses warning about multiple provider blocks
@jackwhelpton
Copy link
Contributor Author

Thanks for the heads up, I'll look at that bit next.

@jackwhelpton
Copy link
Contributor Author

Looks like the same problem over there: there's an automated PR that's passing the lint checks, but raising testing errors that aren't visible to us mere mortals. Any chance you could relay those? The PR in that case is terraform-google-modules/terraform-google-gcloud#108.

@bharathkkb
Copy link
Member

@jackwhelpton I have updated gcloud to allow 4.0 and will cut a release in a bit. You can use main to iterate on this PR

@jackwhelpton
Copy link
Contributor Author

jackwhelpton commented Nov 23, 2021

There's quite a dependency chain here, next up is https://github.com/terraform-google-modules/terraform-google-vm/blob/master/modules/compute_disk_snapshot/versions.tf. That one doesn't seem to have an automated PR, I'll raise one...

terraform-google-modules/terraform-google-vm#215 if anybody wants to help get that released.

@jackwhelpton
Copy link
Contributor Author

Thanks for the continued support on this. Looks like the terraform-google-vm change is merged, so you'll find me over at https://github.com/terraform-google-modules/terraform-google-bastion-host/blob/master/modules/bastion-group/main.tf#L42 next... I'll do the same thing there as we did in this PR, use the master branch until we've got a viable release for the vm module.

@bharathkkb
Copy link
Member

@jackwhelpton thanks for working on these. For those modules we use in examples(like bastion host), we can leave the example at 3.x and not block this PR on that. If another module is used within a module(like gcloud), then we will need to fix those first.

@jackwhelpton
Copy link
Contributor Author

If the examples have dependencies that have dependencies that have ... that eventually rely on google < 4, doesn't that cause the conflicts we're still seeing above?

@bharathkkb
Copy link
Member

I forgot in this case we have to constrain GKE module to 4.0+ due to breaking changes, so you are right

@bharathkkb
Copy link
Member

@jackwhelpton
Copy link
Contributor Author

Great, thanks... so I guess this guy is next?

terraform-google-modules/terraform-google-bastion-host#98

@@ -610,9 +607,10 @@ resource "google_container_node_pool" "pools" {
for_each = local.cluster_node_metadata_config

content {
node_metadata = lookup(each.value, "node_metadata", workload_metadata_config.value.node_metadata)
mode = lookup(each.value, "node_metadata", workload_metadata_config.value.mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to change the input value (ie. still look at node_metadata).

Copy link
Contributor Author

@jackwhelpton jackwhelpton Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to refresh my memory on this (and find a line reference), but I think I'm still using the original input value, but I've adjusted the workload_metadata_config object to match the names of the new properties, so it serves as an adapter between the two; at the time that seemed to make the most sense to me.

@morgante
Copy link
Contributor

Another question related to the above: I notice that the existing list of node pool names is concatenated with an empty string. Is there a reason for this? If we convert the other outputs to maps, should they have values with an empty key?

I believe this was done as a workaround for cases where the list was rebuilt. It's possible the bug has disappeared from Terraform Core so we should be okay with not including an empty key in maps.

depends_on = [
google_container_cluster.primary
]
}

output "instance_group_urls" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep this output value, as it is helpful for broadly addressing the cluster. Could we simply concat all the instance groups from the different node pools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By all means: so you'd keep the new node_pools_ outputs but also include this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, just saw your next comment, perhaps I'll wait for you to finish the review :)

I don't think I have enough knowledge about how the instance_group_urls output is currently consumed: it's obviously possible to keep it as a single flattened list, but now the property has migrated to the node pool level within the provider I worried about the loss of information that would result from doing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, it's most useful for addressing the cluster as a whole to apply networking changes. Let's leave it as-is—we can always add an additional output later if requests come in, but every output we add is an addition to the API surface.

value = local.cluster_node_pools_versions
}

output "node_pools_instance_group_urls" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not add a new output unless needed (see below coment).

output "identity_namespace" {
description = "Workload Identity namespace"
value = length(local.cluster_workload_identity_config) > 0 ? local.cluster_workload_identity_config[0].identity_namespace : null
output "workload_pool" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a real need to change this output name, since it's still pointing to the same value.

@@ -548,8 +536,8 @@ variable "database_encryption" {
}]
}

variable "identity_namespace" {
description = "Workload Identity namespace. (Default value of `enabled` automatically sets project based namespace `[project_id].svc.id.goog`)"
variable "workload_pool" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to change this variable name (can add a note in the description that this is otherwise known as workload_pool).

@jackwhelpton
Copy link
Contributor Author

I think I've addressed those comments, let me know what the build failure is now. Owing to our organizational setup it's unfortunately pretty tricky for me to run these integration tests locally.

@bharathkkb
Copy link
Member

@jackwhelpton PFA logs. I didn't find anything off at a quick glance but saw hashicorp/terraform-provider-google#10494

Step #6 - "converge shared-vpc-local":        
Step #6 - "converge shared-vpc-local":        Error: 1 error occurred:
Step #6 - "converge shared-vpc-local":        	* one of source_tags, source_ranges, or source_service_accounts must be defined
Step #6 - "converge shared-vpc-local":        
Step #6 - "converge shared-vpc-local":        
Step #6 - "converge shared-vpc-local":        
Step #6 - "converge shared-vpc-local":          with module.example.module.gke.google_compute_firewall.master_webhooks[0],
Step #6 - "converge shared-vpc-local":          on ../../../firewall.tf line 63, in resource "google_compute_firewall" "master_webhooks":
Step #6 - "converge shared-vpc-local":          63: resource "google_compute_firewall" "master_webhooks" {
Step #6 - "converge shared-vpc-local":        
Step #6 - "converge shared-vpc-local": >>>>>> Running the command `terraform apply -auto-approve -lock=true -lock-timeout=0s -input=false -no-color -parallelism=10 -refresh=true  ` failed due to a non-zero exit code of 1.
Step #6 - "converge shared-vpc-local": >>>>>> ------Exception-------
Step #6 - "converge shared-vpc-local": >>>>>> Class: Kitchen::ActionFailed
Step #6 - "converge shared-vpc-local": >>>>>> Message: 1 actions failed.
Step #6 - "converge shared-vpc-local": >>>>>>     Converge failed on instance <shared-vpc-local>.  Please see .kitchen/logs/shared-vpc-local.log for more details
Step #6 - "converge shared-vpc-local": >>>>>> ----------------------
Step #6 - "converge shared-vpc-local": >>>>>> Please see .kitchen/logs/kitchen.log for more details
Step #6 - "converge shared-vpc-local": >>>>>> Also try running `kitchen diagnose --all` for configuration
Finished Step #6 - "converge shared-vpc-local"
ERROR
ERROR: build step 6 "gcr.io/cloud-foundation-cicd/cft/developer-tools:1.0" failed: step exited with non-zero status: 20

@jackwhelpton
Copy link
Contributor Author

Thanks for the repro on that linked ticket; it does indeed look like a provider bug. Damnit.

@jackwhelpton
Copy link
Contributor Author

jackwhelpton commented Dec 1, 2021

I'm trying to set up an environment where I can run the integration tests locally... I actually made some progress towards this for some earlier work I did on the Workload Identity module. I've got a point where I can prepare the test environment using make docker_test_prepare, but when I execute make_docker_test_integration everything fails.

Looking in the .kitchen/logs directory that's been created I find lots of errors of the form shown below: it seems to be struggling to realize that a particular .tf contains no direct content, but simply a link to another file. Any ideas how I can fix this?

I've tried walking through https://codelabs.developers.google.com/codelabs/cft-onboarding/#7 and running in interactive mode (executing a single example), but the results are the same.

I, [2021-12-01T18:46:22.019544 #1714]  INFO -- disable-client-cert-local: �[31m�[0mThere are some problems with the configuration, described below.
I, [2021-12-01T18:46:22.019854 #1714]  INFO -- disable-client-cert-local: 
I, [2021-12-01T18:46:22.020045 #1714]  INFO -- disable-client-cert-local: The Terraform configuration must be valid before initialization so that
I, [2021-12-01T18:46:22.020224 #1714]  INFO -- disable-client-cert-local: Terraform can determine which modules and providers need to be installed.�[0m�[0m�[0m
I, [2021-12-01T18:46:22.020439 #1714]  INFO -- disable-client-cert-local: �[31m�[31m╷�[0m�[0m
I, [2021-12-01T18:46:22.020629 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m�[1m�[31mError: �[0m�[0m�[1mArgument or block definition required�[0m
I, [2021-12-01T18:46:22.020844 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m
I, [2021-12-01T18:46:22.021037 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m�[0m  on outputs.tf line 1:
I, [2021-12-01T18:46:22.021208 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m   1: �[4m.�[0m./shared/outputs.tf�[0m
I, [2021-12-01T18:46:22.021369 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m
I, [2021-12-01T18:46:22.021552 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0mAn argument or block definition is required here.
I, [2021-12-01T18:46:22.021736 #1714]  INFO -- disable-client-cert-local: �[31m╵�[0m�[0m
I, [2021-12-01T18:46:22.021932 #1714]  INFO -- disable-client-cert-local: �[0m�[0m
I, [2021-12-01T18:46:22.022722 #1714]  INFO -- disable-client-cert-local: �[31m�[31m╷�[0m�[0m
I, [2021-12-01T18:46:22.022962 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m�[1m�[31mError: �[0m�[0m�[1mArgument or block definition required�[0m
I, [2021-12-01T18:46:22.023193 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m
I, [2021-12-01T18:46:22.023391 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m�[0m  on variables.tf line 1:
I, [2021-12-01T18:46:22.023644 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m   1: �[4m.�[0m./shared/variables.tf�[0m
I, [2021-12-01T18:46:22.023921 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0m
I, [2021-12-01T18:46:22.024166 #1714]  INFO -- disable-client-cert-local: �[31m│�[0m �[0mAn argument or block definition is required here.
I, [2021-12-01T18:46:22.024395 #1714]  INFO -- disable-client-cert-local: �[31m╵�[0m�[0m
I, [2021-12-01T18:46:22.024692 #1714]  INFO -- disable-client-cert-local: �[0m�[0m
E, [2021-12-01T18:46:22.026027 #1714] ERROR -- disable-client-cert-local: Destroy failed on instance <disable-client-cert-local>.

@jackwhelpton
Copy link
Contributor Author

jackwhelpton commented Dec 7, 2021

Oh boo. @bharathkkb , could you share the reason for this recent failure so I can look into it? As far as I'm aware all I did was update to use the newly published versions of a couple of dependencies.

I'm assuming it's just down to the known firewall bug which we're now discussing here: GoogleCloudPlatform/magic-modules#5526

…terraform-google-kubernetes-engine into feature/provider-upgrade

# Conflicts:
#	autogen/main/versions.tf.tmpl
#	examples/node_pool_update_variant_beta/main.tf
#	examples/node_pool_update_variant_public_beta/main.tf
#	examples/regional_private_node_pool_oauth_scopes/provider.tf
#	examples/safer_cluster/main.tf
#	examples/safer_cluster_iap_bastion/provider.tf
#	examples/simple_regional_beta/main.tf
#	examples/simple_regional_private_beta/main.tf
#	examples/simple_zonal_with_asm/main.tf
#	examples/workload_metadata_config/main.tf
#	modules/beta-private-cluster-update-variant/versions.tf
#	modules/beta-private-cluster/versions.tf
#	modules/beta-public-cluster-update-variant/versions.tf
#	modules/beta-public-cluster/versions.tf
@bharathkkb
Copy link
Member

@jackwhelpton looks like the latest error is from workload-metadata-config possibly via

Error: expected node_pool.0.node_config.0.workload_metadata_config.0.mode to be one of [MODE_UNSPECIFIED GCE_METADATA GKE_METADATA], got SECURE
       
         with module.example.module.gke.google_container_cluster.primary,
         on ../../../modules/private-cluster/cluster.tf line 137, in resource "google_container_cluster" "primary":
        137:     node_config {

@jackwhelpton
Copy link
Contributor Author

That makes sense, as SECURE has been deprecated now... fixed that (hopefully?)

@raj-saxena
Copy link
Contributor

Thanks for the work so far on this @jackwhelpton & @bharathkkb. We have been eagerly waiting on this module to support Google provider version > 4.0.0.
Is there an ETA to get this out? Is there some way in which I can help?

Copy link
Member

@bharathkkb bharathkkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM with one comment below. Looks like we also need to rebase this @jackwhelpton

@@ -81,6 +81,7 @@ resource "google_compute_firewall" "master_webhooks" {
direction = "INGRESS"

source_ranges = [local.cluster_endpoint_for_nodes]
source_tags = [""]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI seems to be failing due to this. IIRC we added this due hashicorp/terraform-provider-google#10494. Maybe we should do source_tags = [] as a workaround

       Error: Error creating Firewall: googleapi: Error 400: Invalid value for field 'resource.sourceTags[0]': ''. Must be a match of regex '(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)', invalid
       
         with module.example.module.gke.google_compute_firewall.master_webhooks[0],
         on ../../../firewall.tf line 63, in resource "google_compute_firewall" "master_webhooks":
         63: resource "google_compute_firewall" "master_webhooks" {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the "correct" (?) fix for that was covered by this:

GoogleCloudPlatform/magic-modules#5526

so we may still see the CI failing until that (or something better) is merged.

On a more personal note, I left my previous employer at the end of last year, so it may be hard for me to take this much further, as the CLA etc. was signed with that email. I'm in touch with a former coworker who I'm going to try and persuade to finish this off for me; I'll let you know how that goes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a more personal note, I left my previous employer at the end of last year, so it may be hard for me to take this much further, as the CLA etc. was signed with that email. I'm in touch with a former coworker who I'm going to try and persuade to finish this off for me; I'll let you know how that goes.

Thanks, we can probably follow through if necessary as well.

@bharathkkb
Copy link
Member

superseded by #1129

@bharathkkb bharathkkb closed this Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants