Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables an existing GSA to be used when setting up Workload Identity #955

Merged
merged 14 commits into from
Jul 20, 2021

Conversation

jackwhelpton
Copy link
Contributor

@jackwhelpton jackwhelpton commented Jul 16, 2021

Resolves #956.

The workload-identity module currently allows for reusing an existing KSA, but not an existing GSA. This is a common requirement for us: typically, we will have two clusters within the same project (for example, a production cluster and a disaster recovery cluster), and want to configure KSAs in each cluster that are bound to the same GSA.

With the existing module, this isn't possible, as errors are thrown owing to the attempt to recreate an existing GSA.

This is a work-in-progress: I believe it's feature complete, but I'm still testing.

@comment-bot-dev
Copy link

comment-bot-dev commented Jul 16, 2021

Thanks for the PR! 🚀
✅ Lint checks have passed.

modules/workload-identity/variables.tf Show resolved Hide resolved
modules/workload-identity/variables.tf Outdated Show resolved Hide resolved
modules/workload-identity/variables.tf Outdated Show resolved Hide resolved
modules/workload-identity/output.tf Outdated Show resolved Hide resolved
modules/workload-identity/main.tf Outdated Show resolved Hide resolved
@jackwhelpton
Copy link
Contributor Author

@morgante I think all the changes you requested have been made now. The resulting module plans fine for me locally. I'm not sure why the kubernetes-engine-int-trigger check is failing, but unfortunately I don't seem to have permissions to view the Cloud Build logs... any further info you can give me to help me fix that up would be great.

@morgante
Copy link
Contributor

@jackwhelpton Here's the error:

       module.example.module.workload_identity_existing_ksa.google_service_account_iam_member.main: Creation complete after 4s [id=projects/ci-gke-c3429ab2-941a/serviceAccounts/existing-regional-cluster-4z3y@ci-gke-c3429ab2-941a.iam.gserviceaccount.com/roles/iam.workloadIdentityUser/serviceaccount:ci-gke-c3429ab2-941a.svc.id.goog[default/foo-ksa]]
       
       Error: "account_id" ("projects/ci-gke-c3429ab2-941a/serviceAccounts/existing-regional-cluster-4z3y@ci-gke-c3429ab2-941a.iam.gserviceaccount.com") doesn't match regexp "^[a-z](?:[-a-z0-9]{4,28}[a-z0-9])$"
       
         on ../../../modules/workload-identity/main.tf line 32, in data "google_service_account" "cluster_service_account":
         32:   account_id = local.gcp_sa_name
       
       
       
       Error: "account_id" ("projects/ci-gke-c3429ab2-941a/serviceAccounts/iden-regional-cluster-4z3y@ci-gke-c3429ab2-941a.iam.gserviceaccount.com") doesn't match regexp "^[a-z](?:[-a-z0-9]{4,28}[a-z0-9])$"
       
         on ../../../modules/workload-identity/main.tf line 32, in data "google_service_account" "cluster_service_account":
         32:   account_id = local.gcp_sa_name

@jackwhelpton
Copy link
Contributor Author

jackwhelpton commented Jul 19, 2021

Thanks! I should have resolved that (using account_id rather than name), but still seeing an error. If there's a way for me to view these logs without bothering you every time, that probably be good for both of us!

I ran through the instructions for running tests locally, but owing to our organization restrictions it'd be preferable for me to use a service account in an existing project rather than having the test setup automatically create a new project. I've tried doing that, but end up getting further non-related errors, things like:

       Error: Argument or block definition required

         on outputs.tf line 1:
          1: ../shared/outputs.tf

       An argument or block definition is required here.


       Error: Argument or block definition required

         on variables.tf line 1:
          1: ../shared/variables.tf

       An argument or block definition is required here.

Given all that, it'd certainly be easier if I can just have the CI process run these tests for me.

@morgante
Copy link
Contributor

@jackwhelpton If you've having trouble setting up a test environment, this codelab might help: https://codelabs.developers.google.com/codelabs/cft-onboarding/#9

Unfortunately it's impossible for us to expose Cloud Build logs publicly. One other option for debugging is to just try terraform applying the example directly.

Here's the latest failure:

       module.example.module.workload_identity_existing_ksa.module.annotate-sa.module.gcloud_kubectl.null_resource.run_command[0]: Creation complete after 7s [id=2698573458861851095]
       
       Error: "service_account_id" ("iden-regional-cluster-mfsy") doesn't match regexp "projects/(?:(?:[-a-z0-9]{1,63}\\.)*(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?):)?(?:[0-9]{1,19}|(?:[a-z0-9](?:[-a-z0-9]{0,61}[a-z0-9])?)|-)/serviceAccounts/((?:(?:[-a-z0-9]{1,63}\\.)*(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?):)?(?:[0-9]{1,19}|(?:[a-z0-9](?:[-a-z0-9]{0,61}[a-z0-9])?))@[a-z]+.gserviceaccount.com$|[0-9]{1,20}-compute@developer.gserviceaccount.com|[a-z](?:[-a-z0-9]{4,28}[a-z0-9])@[-a-z0-9\\.]{1,63}\\.iam\\.gserviceaccount\\.com$)"
       
         on ../../../modules/workload-identity/main.tf line 75, in resource "google_service_account_iam_member" "main":
         75:   service_account_id = local.gcp_sa_name
       
       
       
       Error: "service_account_id" ("existing-regional-cluster-mfsy") doesn't match regexp "projects/(?:(?:[-a-z0-9]{1,63}\\.)*(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?):)?(?:[0-9]{1,19}|(?:[a-z0-9](?:[-a-z0-9]{0,61}[a-z0-9])?)|-)/serviceAccounts/((?:(?:[-a-z0-9]{1,63}\\.)*(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?):)?(?:[0-9]{1,19}|(?:[a-z0-9](?:[-a-z0-9]{0,61}[a-z0-9])?))@[a-z]+.gserviceaccount.com$|[0-9]{1,20}-compute@developer.gserviceaccount.com|[a-z](?:[-a-z0-9]{4,28}[a-z0-9])@[-a-z0-9\\.]{1,63}\\.iam\\.gserviceaccount\\.com$)"
       
         on ../../../modules/workload-identity/main.tf line 75, in resource "google_service_account_iam_member" "main":
         75:   service_account_id = local.gcp_sa_name

@jackwhelpton
Copy link
Contributor Author

jackwhelpton commented Jul 20, 2021

Ta. I'm discussing with our ops team at the moment about creating a sandbox environment where we can use the integration test suite more as intended. In the meantime, I've addressed that most recent error and it seems like all the tests pass now. Any further changes I should make?

Feel free to squash these commits when you merge, I'm not sure much of this history is particularly edifying for others!

@morgante
Copy link
Contributor

I'm discussing with our ops team at the moment about creating a sandbox environment where we can use the integration test suite more as intended. In the meantime, I've addressed that most recent error and it seems like all the tests pass now. Any further changes I should make?

Thanks for working through this. We're also working on a new test framework which should hopefully make contributions a bit easier.

@morgante morgante merged commit 712fc54 into terraform-google-modules:master Jul 20, 2021
@jackwhelpton
Copy link
Contributor Author

I know this is merged, but saw there was a slight problem with the renaming of cluster_service_account to main, so wanted to clear up a few questions to avoid that happening again:

  • is it not expected that resources might change location between module versions, especially major releases, necessitating a terraform state mv when upgrading?
  • this PR made the google_service_account optional, meaning that even with the original name, the resource would move from google_service_account.cluster_service_account to google_service_account.cluster_service_account[0]. Is the zero-index privileged in some way in the state file, such that this refactor does not require state moves?

I'd assumed that the change to google_service_account.cluster_service_account[0] would be a breaking change, and given that it seemed to make sense to use the chance to refactor the GSA resources to line up more closely with the KSA ones.

@bharathkkb
Copy link
Member

Hi @jackwhelpton

is it not expected that resources might change location between module versions, especially major releases, necessitating a terraform state mv when upgrading?

Yes they may change, but we try to minimize it whenever possible and if necessary add docs with helper commands (like tf state mv) to help with the upgrade. In this case, we thought it made sense to keep the resource name as is and because we missed identifying this as breaking during the release.

this PR made the google_service_account optional, meaning that even with the original name, the resource would move from google_service_account.cluster_service_account to google_service_account.cluster_service_account[0]. Is the zero-index privileged in some way in the state file, such that this refactor does not require state moves?

I believe so, since technically when we use count=1 it is possibly to assume the zero instance is the instance that existed before(vice versa if were to go from count =1 to using no count) and TF probably takes this into account to prevent unnecessary diffs.

@jackwhelpton
Copy link
Contributor Author

Thanks for those clarifications. I knew writing, e.g. cluster_service_account.name rather than cluster_service_account[0].name in TF code would result in an error, but I hadn't realized the equivalence TF was drawing at plan-time; another compelling argument for using count over for_each for optional resources.

If I'd known that in advance, I wouldn't have done that rename, so fully agree with reverting it: the consistency between GSA and KSA names isn't worth the added update pain. Cheers for fixing the resulting mess too :)

CPL-markus pushed a commit to WALTER-GROUP/terraform-google-kubernetes-engine that referenced this pull request Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable an existing GSA to be used by the workload-identity module
4 participants