Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_project_iam_binding & google_project_service incorrect destruction order #1356

Closed
hawksight opened this issue Apr 19, 2018 · 4 comments

Comments

@hawksight
Copy link

hawksight commented Apr 19, 2018

Terraform Version

Terraform v0.11.7
+ provider.google v1.9.0
+ provider.null v1.0.0

Affected Resource(s)

  • google_project_iam_binding
  • google_project_service

Terraform Configuration Files

The following is an extract but gets the gist for the main issue here:

# --- Resource Configuration --- #
resource "google_project" "project" {
  name            = "some_project"
  project_id      = "some_project"
  billing_account = "${var.GCLOUD_BILLING_ID}"
  org_id          = "${var.GCLOUD_ORG_ID}"
  skip_delete     = "true"
}

/* Add admins to the project from something email group
 */
resource "google_project_iam_binding" "admins" {
  project = "${google_project.project.project_id}"
  role    = "roles/owner"

  members = [
    "group:something@example.com",
    "serviceAccount:terraform@some_other_project.iam.gserviceaccount.com"
  ]
}

/* Need to enable apis / services
 * Docs: https://www.terraform.io/docs/providers/google/r/google_project_service.html
 * This will enable extras and leave defaults enabled
 */
resource "google_project_service" "enable" {
  count   = "${length(var.PROJECT_SERVICES)}"
  project = "${google_project.project.project_id}"
  service = "${element(var.PROJECT_SERVICES, count.index)}"
}

/* This makes following errors because of the async nature of enabling
 * apis within gcloud. So ensues a hack!
 */
resource "null_resource" "wait" {
  # Use this to wait a minute
  provisioner "local-exec" {
    command     = "import time; time.sleep(60))"
    interpreter = ["python3", "-c"]
  }
  depends_on = ["google_project_service.enable"]
}

/* Now make some actual resources to serve infrastructure
 */
module "network" {...}

module "cluster" {...}

module "lb-http" {...}

Debug Output

The plan to destroy: terraform plan -var-file=inputs.tfvars -out plan -destroy

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

null_resource.np: Refreshing state... (ID: 2454169200766082532)
google_project.project: Refreshing state... (ID: some_project)
data.google_compute_lb_ip_ranges.lb: Refreshing state...
google_project_iam_binding.admins: Refreshing state... (ID: some_project/roles/owner)
google_project_service.enable[0]: Refreshing state... (ID: some_project/compute.googleapis.com)
google_compute_network.vpc: Refreshing state... (ID: vpc-ops)
google_project_service.enable[1]: Refreshing state... (ID: some_project/container.googleapis.com)
google_compute_subnetwork.subnet: Refreshing state... (ID: europe-west2/vpc-ops-subnet1)
google_compute_health_check.default: Refreshing state... (ID: vpc-ops-lb-backend-0)
google_compute_firewall.ingress: Refreshing state... (ID: vpc-ops-k8s-ingress)
google_compute_firewall.default-hc: Refreshing state... (ID: vpc-ops-lb-hc-0)
google_compute_global_address.default: Refreshing state... (ID: vpc-ops-lb-address)
google_compute_firewall.fw-lb: Refreshing state... (ID: vpc-ops-lb-allow-web)
google_container_cluster.cluster: Refreshing state... (ID: k8s-ops)
google_container_node_pool.pools: Refreshing state... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1)
google_compute_backend_service.default: Refreshing state... (ID: vpc-ops-lb-backend-0)
google_compute_url_map.custom: Refreshing state... (ID: some_project-url-map)
google_compute_target_http_proxy.default: Refreshing state... (ID: vpc-ops-lb-http-proxy)
google_compute_global_forwarding_rule.http: Refreshing state... (ID: vpc-ops-lb-http)

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  - google_compute_url_map.custom

  - google_project.project

  - google_project_iam_binding.admins

  - google_project_service.enable[0]

  - google_project_service.enable[1]

  - module.cluster.google_compute_firewall.ingress

  - module.cluster.google_container_cluster.cluster

  - module.cluster.google_container_node_pool.pools

  - module.cluster.null_resource.np

  - module.lb-http.google_compute_backend_service.default

  - module.lb-http.google_compute_firewall.default-hc

  - module.lb-http.google_compute_firewall.fw-lb

  - module.lb-http.google_compute_global_address.default

  - module.lb-http.google_compute_global_forwarding_rule.http

  - module.lb-http.google_compute_health_check.default

  - module.lb-http.google_compute_target_http_proxy.default

  - module.network.google_compute_network.vpc

  - module.network.google_compute_subnetwork.subnet


Plan: 0 to add, 0 to change, 18 to destroy.

------------------------------------------------------------------------

This plan was saved to: plan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan"

Apply output

google_project_iam_binding.admins: Destroying... (ID: some_project/roles/owner)
google_project_service.enable[1]: Destroying... (ID: some_project/container.googleapis.com)
google_project_service.enable[0]: Destroying... (ID: some_project/compute.googleapis.com)
module.lb-http.google_compute_global_forwarding_rule.http: Destroying... (ID: vpc-ops-lb-http)
module.cluster.google_compute_firewall.ingress: Destroying... (ID: vpc-ops-k8s-ingress)
module.lb-http.google_compute_firewall.fw-lb: Destroying... (ID: vpc-ops-lb-allow-web)
module.cluster.google_container_node_pool.pools: Destroying... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1)
module.lb-http.google_compute_firewall.default-hc: Destroying... (ID: vpc-ops-lb-hc-0)
google_project_iam_binding.admins: Destruction complete after 6s
module.lb-http.google_compute_firewall.fw-lb: Still destroying... (ID: vpc-ops-lb-allow-web, 10s elapsed)
module.cluster.google_container_node_pool.pools: Still destroying... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1, 10s elapsed)
google_project_service.enable.1: Still destroying... (ID: some_project/container.googleapis.com, 10s elapsed)
module.lb-http.google_compute_firewall.default-hc: Still destroying... (ID: vpc-ops-lb-hc-0, 10s elapsed)
module.cluster.google_compute_firewall.ingress: Still destroying... (ID: vpc-ops-k8s-ingress, 10s elapsed)
module.lb-http.google_compute_global_forwarding_rule.http: Still destroying... (ID: vpc-ops-lb-http, 10s elapsed)
google_project_service.enable.0: Still destroying... (ID: some_project/compute.googleapis.com, 10s elapsed)
module.lb-http.google_compute_global_forwarding_rule.http: Destruction complete after 11s
module.lb-http.google_compute_target_http_proxy.default: Destroying... (ID: vpc-ops-lb-http-proxy)
google_project_service.enable[1]: Destruction complete after 12s
module.cluster.google_compute_firewall.ingress: Destruction complete after 13s
module.lb-http.google_compute_firewall.default-hc: Destruction complete after 17s
module.lb-http.google_compute_firewall.fw-lb: Destruction complete after 18s
module.lb-http.google_compute_global_address.default: Destroying... (ID: vpc-ops-lb-address)
module.cluster.google_container_node_pool.pools: Still destroying... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1, 20s elapsed)
module.lb-http.google_compute_target_http_proxy.default: Still destroying... (ID: vpc-ops-lb-http-proxy, 10s elapsed)
module.lb-http.google_compute_target_http_proxy.default: Destruction complete after 12s
google_compute_url_map.custom: Destroying... (ID: some_project-url-map)
module.lb-http.google_compute_global_address.default: Still destroying... (ID: vpc-ops-lb-address, 10s elapsed)
module.lb-http.google_compute_global_address.default: Destruction complete after 10s
module.cluster.google_container_node_pool.pools: Still destroying... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1, 30s elapsed)
google_compute_url_map.custom: Still destroying... (ID: some_project-url-map, 10s elapsed)
google_compute_url_map.custom: Destruction complete after 11s
module.lb-http.google_compute_backend_service.default: Destroying... (ID: vpc-ops-lb-backend-0)
module.cluster.google_container_node_pool.pools: Still destroying... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1, 40s elapsed)
module.lb-http.google_compute_backend_service.default: Still destroying... (ID: vpc-ops-lb-backend-0, 10s elapsed)
module.lb-http.google_compute_backend_service.default: Destruction complete after 11s
module.lb-http.google_compute_health_check.default: Destroying... (ID: vpc-ops-lb-backend-0)
module.cluster.google_container_node_pool.pools: Still destroying... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1, 50s elapsed)
module.lb-http.google_compute_health_check.default: Still destroying... (ID: vpc-ops-lb-backend-0, 10s elapsed)
module.lb-http.google_compute_health_check.default: Destruction complete after 10s
module.cluster.google_container_node_pool.pools: Still destroying... (ID: europe-west2-a/k8s-ops/vpc-ops-np-1, 1m0s elapsed)

Error: Error applying plan:

3 error(s) occurred:

* google_project_service.enable[0] (destroy): 1 error(s) occurred:

* google_project_service.enable.0: Error disabling service: Error disabling service "compute.googleapis.com" for project "some_project": Error code 9, message: Could not turn off service, as it still has resources in use.:[compute.googleapis.com]
* module.cluster.output.K8S_INSTANCE_GROUP_URLS: Resource 'data.google_container_cluster.information' does not have attribute 'instance_group_urls' for variable 'data.google_container_cluster.information.instance_group_urls'
* module.cluster.google_container_node_pool.pools (destroy): 1 error(s) occurred:

* google_container_node_pool.pools: Error waiting for deleting GKE NodePool: googleapi: Error 403: Required "container.operations.get" permission for "projects/some_project/zones/europe-west2-a/operations/operation-1524131461021-7acd4173"., forbidden

Expected Behavior

The resources google_project_iam_binding and google_project_service should have been the very last resources to be deleted. Ie.

  • everything else
  • google_project_service
  • google_project_iam_binding - This has to be last because google_project_iam_binding will overwrite the 'owner' role when re-applying terraform. So if you try and add people using this, it will remove owners not provided in here. Hence all owners are assigned using a count with this feature. Including the service account that I use to create the project and everything else.

tl;dr - google_project_iam_binding overwrites role membership rather than appending. I should have used google_project_iam_member instead

Actual Behavior

Terraform seemed to go free range and just start deleting things. The two most important resources are deleted first! Which leads to subsequent issues of:

  1. Now being locked out of a project because its removed me and the service account.
  2. Terraform cannot remove other resources because its disabled services. It's also removed the service account performing the actions from the project, so even if the services were still enabled.. there's no longer an account to run with. You can see the error trying to remove a node pool.

Steps to Reproduce

  1. Terraform config that creates , project, iam_binding, enable 'compute' service, and goes and builds a compute instance in project created. Preferably a module for the compute instance bit as it will show up where any depends_on flaws + my infra is all based on various modules. (see my hcl example for reference)
  2. Build it out.
  3. Try and destroy everything - see what order terraform tries to do things

Important Factoids

IAM

I initially only used google_project_iam_binding to add owners (I did not use the service account in here). However after creation of the project and adding this permission, I tried to reapply and noticed that it was going to remove the service account, and add the new permissions instead.
Hence why in this scenario, I added the service account into this object (using count) as well as the additional members.

I used google_project_iam_binding rather than google_project_iam_policy explicitly because I noted this warning:
screen shot 2018-04-19 at 11 16 23

Obviously it seems you can also lock yourself out with the google_project_iam_binding resource if not used correctly (I'm guessing there are some simple tricks to avoid what I have done). Perhaps the warning should be for all resources as I assumed it was specific to policy.

Outputs

There's also an error there around a data object:

* module.cluster.output.K8S_INSTANCE_GROUP_URLS: Resource 'data.google_container_cluster.information' does not have attribute 'instance_group_urls' for variable 'data.google_container_cluster.information.instance_group_urls'

Perhaps this might indicate an ordering issue or issue with data objects. In my case the data object is used post nope_pool creation. This is to trap all instance_group_urls.

data "google_container_cluster" "information" {
  name       = "${google_container_cluster.cluster.name}"
  zone       = "${google_container_cluster.cluster.zone}"
  project    = "${var.META_PROJECT}"
  depends_on = ["google_container_node_pool.pools"]
}

output "K8S_INSTANCE_GROUP_URLS" {
  value       = "${data.google_container_cluster.information.instance_group_urls}"
  description = "URLs to the instance groups for all nodes"
}

I actually don't care about this during destruction. I can't think of a scenario where I would be. I've had this issue a little while and usually something like export TF_WARN_OUTPUT_ERRORS=1 will ignore this - but the default behaviour in this scenario I think should be just to ignore the output entirely. Im destroying it, which should mean 'I accept there's not going to be an output because there will be no data when the cluster is destroyed'.

It looks like its trying to get data on something its destroyed in in the process of destroying. Perhaps the simple fix is to add a flag to not lookup data object on destroy?

My Modules

My modules all work when used inside a project and giving the code for that means much more copy paste. And it doesn't add anything particular to this issue.

References

#1292 - Slightly related in the discussion about how dependency on deletion is formed. service and iam_member resources both seem to get deleted early on.

@nat-henderson
Copy link
Contributor

Hi, thanks for the detailed error report. I'm sorry that you had trouble with the google provider, but unfortunately there's not going to be much we can do to make this process less painful. It seems like there are three issues here:

Deletion Order

Terraform's deletion order is the reverse of its creation order, which is entirely based on the dependency graph. Unfortunately, cross-resource dependencies of this sort are not expressible in a terraform provider - you need to handle this on your end, ideally by having all your resources transitively depend on google_project_service, which itself depends on google_project_iam_binding, using a depends_on clause where it's not possible to depend via interpolation.

IAM's dangerousness

An IAM policy is extremely dangerous, because without touching the owners of a project, you will automatically remove all of them. An IAM binding is considered less dangerous, because you are explicitly setting the list of owners - if you don't include an account that you have access to in the binding, then you can lock yourself out of the project, but it's less likely / less unexpected than with the policy. You did the right thing by using binding. If you think a warning on those docs pages would have helped you avoid this issue, I'd approve a PR adding one. Or just suggest some wording here and I'll go add it myself.

Output warnings

I agree with you that that's an unnecessary warning - that issue is best filed with Terraform core because it's not specific to the google provider.

@hawksight
Copy link
Author

hawksight commented May 1, 2018

Deletion Order

Thank you for this, I had always assumed terraform had some basic behind the scenes order config such as 'Delete compute resources before permissions resources' etc.
I can see now it doesn't and it relies entirely on implicit and explicit dependencies.

In my case I am now using implicit dependencies because there is no dependencies (depends_on) between modules. I just hadn't realised I needed to use them and had been in other cases.
(Thanks to @mrparkers for assistance on slack).

IAM' dangerousness

My case was a combination of not realising that binding was authoritative (which is in the docs) , using the permission in development (spin up and teardown) and ordering as mentioned.

Probably easier for you to add to the docs as I'm not setup to contrib atm. Also you can reword / oragnise my comments into something that coherent with the current content.

This would be my contribution of a note (or equivalent) to be applied for all three resources.

Considerations for destruction of permissions resources:

  • Consider the order of deletion, as removal of authoritative resources (google_project_iam_policy and google_project_iam_binding) could cause issues with no permissions to remove other resources. Look carefully at your implicit and explicit dependencies.
  • Authoritative resources could end up removing your active account / service account from the permissions, if you have added it.
  • Pay particular attention to roles/owner. Make sure you have accounted for the project creator who will have this binding. If you need to use this, consider starting with google_project_iam_member which will add member and only remove those you explicitly added upon destruction.

And I would probably consider adding the same Be Careful! warning to google_project_iam_binding as is already present on google_project_iam_policy. (as screenshot in initial report.

Output warnings

I will raise an issue on the main terraform and link back to here when I get around to it.

On reflection and actually 'reading' (not scanning) the docs for these resources they make a lot of sense and the warnings are there, perhaps just not as explicitly.

I think adding some additional explicit warnings should close off this ticket. The ordering is not 'wrong', its just difficult to get right.
Also my understanding of the ordering was not up to scratch and outputs are a separate issue.

@paddycarver
Copy link
Contributor

Our IAM docs have been updated since this was opened, but I tried to apply the spirit of it to #1735. I opted not to call out the dependency ordering info, as that's already covered in the getting started guide for Terraform, so I think it's inappropriate to document it at the provider level. I also worry about losing the forest for the trees in our documentation by being too verbose; after all, a warning to be careful about order of deletion could apply to most our resources, which could return errors if they're not deleted in the proper order. I think that information is better suited in the getting started guide, and that the provider focus on documenting what's unique to the provider.

@ghost
Copy link

ghost commented Nov 17, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants