Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow updating workload identity in GKE clusters without force create #896

Conversation

sandeepsukhani
Copy link
Contributor

Since gcloud allows updating workload identity from console and sdk without recreating the cluster, terraform also should not force recreation.

I have tested it locally with this change, it worked fine for me without recreating the cluster.

@ghost ghost added the size/xs label Jun 26, 2019
@queeno
Copy link

queeno commented Jun 29, 2019

👍 I completely agree. I am not sure why terraform was behaving differently...

@rileykarson rileykarson self-requested a review July 1, 2019 18:00
@rileykarson
Copy link
Collaborator

The reason we didn't add support for updating initially is that Terraform runs into a bit of a problem with the node_metadata value on node pools. Terraform couldn't (or at least, couldn't easily) update in the correct order so that:

  • No node pool tries to set node_metadata set to GKE_METADATA_SERVER before enabling Workload Identity

  • Each node pool has unset node_metadata from GKE_METADATA_SERVER when disabling Workload Identity

While we could have Terraform update the value on node pools in the cluster, that would be overreaching what it says it would do in terraform plan and I'm very hesitant to make it do that. As well, we'd need to arbitrarily choose one of SECURE or EXPOSE if we did that, which wouldn't be great if the chosen value breaks users.

Alternatively, we could attempt the update and fail if it's impossible. That's what this PR does, I think. I anticipate this being frustrating for users because updating node_metadata and workload_identity_config in the same terraform apply won't work- but this could end up being the best option regardless.

I'll play around with the API to refresh myself on how it behaves, and verify what I said here.

@sandeepsukhani
Copy link
Contributor Author

sandeepsukhani commented Jul 1, 2019

The reason why I want this is recreating cluster loses all the deployments and would add up to the downtime and overhead. I would be happy to do any changes to make the user experience better.
I also had this question about update order but then I thought maybe Terraform handles it already and didn't get a chance to check it.

@rileykarson
Copy link
Collaborator

rileykarson commented Jul 3, 2019

Just a brief status update- I'm verifying that what I said about the behaviour of updates is correct. I was out yesterday, but should loop back here by EOD. If not, I will on Monday.

Edit: I looked at this briefly during the week, but didn't end up finishing. I should be able to get to it next week, though.

@sandeepsukhani
Copy link
Contributor Author

Hey @rileykarson,
Thanks for having a look at this. Did you get a chance to finish validating the behaviour?

@rileykarson
Copy link
Collaborator

rileykarson commented Jul 17, 2019

Sorry this took so long! I ran into some issues with Terraform partial updates when attempting this initially, and then some other work kept pulling me off.

So, this works when turning on Workload Identity and enabling the new value on node pools. It won't work when turning it off though, as Terraform will update the cluster before attempting to update the node pool. Unfortunately, Terraform is single-pass so it can't handle these cases well.

We'll need to note that users need to explicitly use another value on all node pools in the docs under the website/r/ folder.

In addition, I'd like to add a test for update. Can you change the value in https://github.com/terraform-providers/terraform-provider-google-beta/blob/master/google-beta/resource_container_cluster_test.go#L738-L762? https://github.com/terraform-providers/terraform-provider-google-beta/blob/master/google-beta/resource_container_cluster_test.go#L564-L600 is an example testing update, you'll see that there are multiple configs.

I sent a PR to your fork (it should be linked in GH below) that adds some massaging necessary for disablement. If you merge that, it'll show up as part of the changes here as well. Sorry for the extensive patch! I was playing around with changing the value in clusters, and ended up making a handful of changes on top of your branch to test it out.

@sandeepsukhani
Copy link
Contributor Author

@rileykarson Thanks for the PR, I have merged it.
Regarding the test, shouldn't this be the one that should be updated? https://github.com/terraform-providers/terraform-provider-google-beta/blob/master/google-beta/resource_container_cluster_test.go#L1299-L1322

@sandeepsukhani
Copy link
Contributor Author

sandeepsukhani commented Jul 18, 2019

@rileykarson It seems gcloud api is not accepting empty value for IdentityNamespace. Its throwing Error: googleapi: Error 400: Must specify a field to update., badRequest. I could not find anything related to disabling it using api or sdk in the docs. Does it work for you?

@rileykarson
Copy link
Collaborator

Hmm- can you share the config you're testing with? It's possible that we've specified the removal of the namespace differently. I'd alternated between these:

resource "google_container_cluster" "cluster" {
  provider = "google-beta"
  name               = "tf-cluster-nodepool-test"
  zone               = "us-central1-a"
  remove_default_node_pool = true
  initial_node_count = 1
  min_master_version = "${data.google_container_engine_versions.central1a.latest_master_version}"

  workload_identity_config {
    identity_namespace = "my namespace"
  }
}
resource "google_container_cluster" "cluster" {
  provider = "google-beta"
  name               = "tf-cluster-nodepool-test"
  zone               = "us-central1-a"
  remove_default_node_pool = true
  initial_node_count = 1
  min_master_version = "${data.google_container_engine_versions.central1a.latest_master_version}"
}

@ghost ghost added size/m and removed size/s labels Jul 19, 2019
@sandeepsukhani
Copy link
Contributor Author

@rileykarson Thanks for sharing the config, I figured out the issue. I was trying to set identity_namespace to empty string instead of removing the whole workload_identity_config block. I have pushed the tests, please let me know whether it looks good.
In the tests, I had to disable workload identity in node pool before turning it off for the cluster as you said.

@sandeepsukhani
Copy link
Contributor Author

Hey @rileykarson,
Do you have any thoughts on the last changes that I pushed?

Copy link
Collaborator

@rileykarson rileykarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that! I didn't notice the notification for the code push. LGTM!

I'll upstream this to our code generator to make sure the change can easily be applied to the google provider when the feature goes GA, and merge this once it's been staged there.

@rileykarson
Copy link
Collaborator

Actually, looks like the test file has been updated. Do you mind rebasing / merging the branch so there are no conflicts?

@sandeepsukhani sandeepsukhani force-pushed the workload-identity-update-wo-force-create branch from 254eab1 to 4dc21b2 Compare July 26, 2019 06:02
Since gcloud allows updating workload identity from console and sdk without recreating the cluster, terraform also should not force recreation.
@sandeepsukhani sandeepsukhani force-pushed the workload-identity-update-wo-force-create branch from 4dc21b2 to 275b903 Compare July 26, 2019 06:14
@sandeepsukhani
Copy link
Contributor Author

@rileykarson Thanks for the review! Rebased and squashed the commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants