Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching a policy for individual members causes a race condition #1613

Closed
Stono opened this issue Jun 7, 2018 · 9 comments
Closed

Switching a policy for individual members causes a race condition #1613

Stono opened this issue Jun 7, 2018 · 9 comments
Assignees
Labels

Comments

@Stono
Copy link

Stono commented Jun 7, 2018

Hola!
We were migrating away from google_project_iam_policy in favour of a bunch of google_project_iam_member resources.

The terraform apply tried to simultaneously delete the existing policy whilst creating the members, resulting in:

* google_project_iam_policy.artefacts_projects: Error applying IAM policy to project: Error applying IAM policy for project "at-artefacts". Policy is &cloudresourcemanager.Policy{AuditConfigs:[]*cloudresourcemanager.AuditConfig(nil), Bindings:[]*cloudresourcemanager.Binding(nil), Etag:"BwVuCwTaqFA=", Version:1, ServerResponse:googleapi.ServerResponse{HTTPStatusCode:200, Header:http.Header{"Server":[]string{"ESF"}, "Cache-Control":[]string{"private"}, "X-Frame-Options":[]string{"SAMEORIGIN"}, "X-Content-Type-Options":[]string{"nosniff"}, "Vary":[]string{"Origin", "X-Origin", "Referer"}, "Date":[]string{"Thu, 07 Jun 2018 17:18:30 GMT"}, "X-Xss-Protection":[]string{"1; mode=block"}, "Alt-Svc":[]string{"quic=\":443\"; ma=2592000; v=\"43,42,41,39,35\""}, "Content-Type":[]string{"application/json; charset=UTF-8"}}}, ForceSendFields:[]string(nil), NullFields:[]string(nil)}, error is googleapi: Error 409: There were concurrent policy changes. Please retry the whole read-modify-write with exponential backoff., aborted

Might be worth a lock across policy, and member resources?

@rosbo
Copy link
Contributor

rosbo commented Jun 7, 2018

We already have a lock... The mutex key is iam-project-[project-id]. There should only be one IAM call happening simultaneously for a given project:
https://github.com/terraform-providers/terraform-provider-google/blob/master/google/iam_project.go#L68

It looks like the bug was caused by an etag mismatch. We have code handling 409 (Conflict) and it should retry. However, it seems from the error message that the error was wrapped twice. The retry logic probably failed to retry because the double wraping hid the fact that the underlying error was a 409.

@rosbo rosbo removed the enhancement label Jun 7, 2018
@rosbo rosbo assigned rosbo and unassigned rosbo Jun 7, 2018
@danawillow
Copy link
Contributor

Ah, we have a mutex for the _member and _binding resources but not for _policy. @Stono, do you have a config you could share that reproduces the error? That would help me test a fix.

@danawillow danawillow self-assigned this Jun 11, 2018
@Stono
Copy link
Author

Stono commented Jun 11, 2018

ahh @danawillow I don't now i'm afraid, but long story short we had a google_project_iam_policy.

and in one commit, we removed google_project_iam_policy and introduced a bunch of google_project_iam_member, so one plan was both simultaneously deleting the project policy and trying to set members on it. Should be easy enough to reproduce

@danawillow
Copy link
Contributor

Cool, got it. Just sent out #1645.

Note though that although this fixes the error in question, there will still be a potential race condition left depending on which resource acquires the lock first. If the iam_policy resource does, then we're all good- it'll restore the policy to what it was before the resource was ever created, and then the iam_member will apply to recreate the policy. However, if the iam_member resource acquires the lock first, then it'll apply the policy that already exists, and then the iam_policy resource will remove it. Re-running terraform would fix it in that case so it's not the end of the world, but just be aware of that.

@stevemsmith
Copy link

I'm in the same situation. Tried to build the provider based on #1645 and it's not working. It's entirely possible I don't know what I'm doing, but here's what I did...

  1. Followed the instructions to build.
  2. Copied provider to ~/.terraform.d/plugins/darwin_amd64
  3. Ran "init"
  4. Ran "plan"
  5. Ran "apply"

And I got the same error:

  • google_project_iam_policy.project-policy: Error applying IAM policy to project: Error applying IAM policy for project .......

Ran it twice, just to be sure. Any help greatly appreciated.

@danawillow
Copy link
Contributor

@biff2005 can you run TF_LOG=DEBUG terraform apply and share the output as a gist?

@stevemsmith
Copy link

Here you go...
https://gist.github.com/biff2005/511e00e23cdd9ead820c9cbabaa55a5a

Thanks for the assistance!

@danawillow
Copy link
Contributor

Ah, ok. That's a different error message- this one is a 400 invalid_argument. Is that the entire debug output? The error message Error applying IAM policy for project only occurs in our codebase after a setIamPolicy call, but I don't see one in the logs.

@ghost
Copy link

ghost commented Nov 18, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants