Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency before destroy #25010

Closed
kustodian opened this issue May 21, 2020 · 13 comments
Closed

Update dependency before destroy #25010

kustodian opened this issue May 21, 2020 · 13 comments
Labels
bug core v0.12 Issues (primarily bugs) reported against v0.12 releases

Comments

@kustodian
Copy link

Terraform Version

Terraform v0.12.24

Terraform Configuration Files

I created hashicorp/terraform-provider-google#6376 in the Google provider, but I was told this is a TF issue. I'm also not sure if this issue is the same issue as #8099, so I created a new one.

In the config below the problem is when you want to lower the number of google_compute_instance_group that are used in a google_compute_region_backend_service it's not possible because you cannot delete an instance group that is being used in a backend service.

Here is an example config:

locals {
  project         = "<project-id>"
  network         = "<vpc-name>"
  network_project = "<vpc-project>"
  zones           = ["europe-west1-b", "europe-west1-c", "europe-west1-d"]
  s1_count        = 3
}

provider "google" {
  project = local.project
  version = "~> 3.0"
}

data "google_compute_network" "network" {
  name    = local.network
  project = local.network_project
}

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  dynamic "backend" {
    for_each = google_compute_instance_group.s1
    content {
      group = backend.value.self_link
    }
  }
  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link
}

Expected Behavior

When I lower the number of instance groups (e.g. set s1_count = 2 in the example above) TF should:

  1. Update google_compute_region_backend_service (remove the last instance group from it)
  2. Delete the surplus google_compute_instance_group

Actual Behavior

  1. Delete the surplus google_compute_instance_group -> fails
  2. Update google_compute_region_backend_service (remove the last instance group from it)

Here is the output:

google_compute_instance_group.s1[2]: Destroying... [id=projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03]

Error: Error deleting InstanceGroup: googleapi: Error 400: The instance_group resource 'projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03' is already being used by 'projects/<project-id>/regions/europe-west1/backendServices/s1', resourceInUseByAnotherResource

Steps to Reproduce

Set s1_count to the lower number than before and run terraform apply.

Additional Context

Very big problem with all this is that it's not easy to fix because it requires some config hacks, running apply multiple times doesn't help.

References

@StephenWithPH
Copy link

StephenWithPH commented May 26, 2020

I experienced a very similar problem. In my case, each time terraform apply failed and reported the error it had received from GCP's api (always 400 resourceInUseByAnotherResource), I tainted the Terraform resource which depended on the thing I was trying to destroy.

By tainting up the chain, I was ultimately able to have a successful, single-pass terraform apply.

Because I was able to resolve this by tainting state, it feels like there should be a way for Terraform to detect this and handle it more gracefully, subject to those dependent resources also being defined in the same Terraform module.

@danieldreier
Copy link
Contributor

@apparentlymart can you help identify whether this is essentially the same issue as #8099?

@danieldreier danieldreier added the waiting-response An issue/pull request is waiting for a response from the community label May 29, 2020
@apparentlymart
Copy link
Contributor

This feels slightly different than #8099 to me, because I think this issue is describing a problem of order of operations -- both of the changes are being planned and executed, but they are happening in a different order than expected -- whereas #8099 is talking instead about generating synthetic additional actions (such as "replace" actions) based only on dependencies.

@apparentlymart apparentlymart added bug core v0.12 Issues (primarily bugs) reported against v0.12 releases and removed waiting-response An issue/pull request is waiting for a response from the community labels Jun 26, 2020
@apparentlymart
Copy link
Contributor

I've not deeply investigated this yet, but I've done some initial labeling of this as a bug until we can dig into it a bit further and understand what's going on here. There is some possibility that we may find that this is behaving as intended but that the design didn't consider this particular situation, in which case we can relabel this as an enhancement once we understand what exactly the use-case is that the current design isn't catering for.

@jbardin
Copy link
Member

jbardin commented Jun 29, 2020

Hi @kustodian,

If I understand the situation correctly, it sounds like you're describing a use case for create_before_destroy. Using create_before_destroy does 2 things, the obvious is that during replacement a new instance is created before the old one is destroyed. The second change in ordering is that dependencies are updated before the destroy, so rather than the default destroy -> create -> update you would see create -> update -> destroy.

This ordering is also preserved when there is only a destroy operation, so that you still have the update of dependencies before the resource is destroyed. This latter part did not always work in 0.12, since it did not have the necessary dependency tracking yet, but I think it may still work in this particular case. If not, this case is definitely covered in 0.13.

@kustodian
Copy link
Author

In case of destroying everything this worked even without create_before_destroy, but if only a few instances need to be destroyed it doesn't work (at least in my example), because if it worked I wouldn't create this issue :)

@jbardin how do you know that this is fixed in 0.13? Where can I see the issue that fixes this?

@kustodian
Copy link
Author

I just tried 0.13beta2 and when I run apply of the configuration above with create_on_destroy on the instance group set and when one instance group needs to be deleted, this is the error that TF produces:

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes


Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_region_backend_service.s1 to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.backend: planned set element
cty.ObjectVal(map[string]cty.Value{"balancing_mode":cty.StringVal("CONNECTION"),
"capacity_scaler":cty.NullVal(cty.Number),
"description":cty.NullVal(cty.String), "failover":cty.False,
"group":cty.StringVal("https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-b/instanceGroups/s1-01"),
"max_connections":cty.NullVal(cty.Number),
"max_connections_per_endpoint":cty.NullVal(cty.Number),
"max_connections_per_instance":cty.NullVal(cty.Number),
"max_rate":cty.NullVal(cty.Number),
"max_rate_per_endpoint":cty.NullVal(cty.Number),
"max_rate_per_instance":cty.NullVal(cty.Number),
"max_utilization":cty.NullVal(cty.Number)}) does not correlate with any
element in actual.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_region_backend_service.s1 to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.backend: planned set element
cty.ObjectVal(map[string]cty.Value{"balancing_mode":cty.StringVal("CONNECTION"),
"capacity_scaler":cty.NullVal(cty.Number),
"description":cty.NullVal(cty.String), "failover":cty.False,
"group":cty.StringVal("https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-c/instanceGroups/s1-02"),
"max_connections":cty.NullVal(cty.Number),
"max_connections_per_endpoint":cty.NullVal(cty.Number),
"max_connections_per_instance":cty.NullVal(cty.Number),
"max_rate":cty.NullVal(cty.Number),
"max_rate_per_endpoint":cty.NullVal(cty.Number),
"max_rate_per_instance":cty.NullVal(cty.Number),
"max_utilization":cty.NullVal(cty.Number)}) does not correlate with any
element in actual.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_region_backend_service.s1 to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.backend: block set length changed from 2 to 3.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

If I remove create_on_destroy TF fails exactly the same like 0.12.

@jbardin
Copy link
Member

jbardin commented Jun 30, 2020

Thanks for testing that out @kustodian!

The part of that last set of errors which is core's responsibility should be fixed in the next beta coming out soon, or you could build from master to test. If there is still an error using create_before_destroy after that, we can take a look at the error, but that might end up being a provider issue to resolve any remaining problems.

@danieldreier
Copy link
Contributor

@kustodian have you had an opportunity to test this on 0.13.0?

@danieldreier danieldreier added the waiting-response An issue/pull request is waiting for a response from the community label Aug 21, 2020
@kustodian
Copy link
Author

I just tried it and it looks like it's the same thing, here is the output of apply when I reduced the number of instances to 2:

% ./terraform apply
data.google_compute_network.network: Refreshing state... [id=projects/<vpc-project>/global/networks/test]
google_compute_health_check.default: Refreshing state... [id=projects/<project-id>/global/healthChecks/s1]
google_compute_instance_group.s1[1]: Refreshing state... [id=projects/<project-id>/zones/europe-west1-c/instanceGroups/s1-02]
google_compute_instance_group.s1[2]: Refreshing state... [id=projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03]
google_compute_instance_group.s1[0]: Refreshing state... [id=projects/<project-id>/zones/europe-west1-b/instanceGroups/s1-01]
google_compute_region_backend_service.s1: Refreshing state... [id=projects/<project-id>/regions/europe-west1/backendServices/s1]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
  - destroy

Terraform will perform the following actions:

  # google_compute_instance_group.s1[2] will be destroyed
  - resource "google_compute_instance_group" "s1" {
      - id        = "projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03" -> null
      - instances = [] -> null
      - name      = "s1-03" -> null
      - network   = "https://www.googleapis.com/compute/v1/projects/<vpc-project>/global/networks/test" -> null
      - project   = "<project-id>" -> null
      - self_link = "https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03" -> null
      - size      = 0 -> null
      - zone      = "europe-west1-d" -> null
    }

  # google_compute_region_backend_service.s1 will be updated in-place
  ~ resource "google_compute_region_backend_service" "s1" {
        affinity_cookie_ttl_sec         = 0
        connection_draining_timeout_sec = 0
        creation_timestamp              = "2020-08-20T23:50:09.457-07:00"
        fingerprint                     = "uNdHGEpOpkM="
        health_checks                   = [
            "https://www.googleapis.com/compute/v1/projects/<project-id>/global/healthChecks/s1",
        ]
        id                              = "projects/<project-id>/regions/europe-west1/backendServices/s1"
        load_balancing_scheme           = "INTERNAL"
        name                            = "s1"
        project                         = "<project-id>"
        protocol                        = "TCP"
        region                          = "europe-west1"
        self_link                       = "https://www.googleapis.com/compute/v1/projects/<project-id>/regions/europe-west1/backendServices/s1"
        session_affinity                = "NONE"
        timeout_sec                     = 30

      - backend {
          - balancing_mode               = "CONNECTION" -> null
          - capacity_scaler              = 0 -> null
          - failover                     = false -> null
          - group                        = "https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-b/instanceGroups/s1-01" -> null
          - max_connections              = 0 -> null
          - max_connections_per_endpoint = 0 -> null
          - max_connections_per_instance = 0 -> null
          - max_rate                     = 0 -> null
          - max_rate_per_endpoint        = 0 -> null
          - max_rate_per_instance        = 0 -> null
          - max_utilization              = 0 -> null
        }
      - backend {
          - balancing_mode               = "CONNECTION" -> null
          - capacity_scaler              = 0 -> null
          - failover                     = false -> null
          - group                        = "https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-c/instanceGroups/s1-02" -> null
          - max_connections              = 0 -> null
          - max_connections_per_endpoint = 0 -> null
          - max_connections_per_instance = 0 -> null
          - max_rate                     = 0 -> null
          - max_rate_per_endpoint        = 0 -> null
          - max_rate_per_instance        = 0 -> null
          - max_utilization              = 0 -> null
        }
      - backend {
          - balancing_mode               = "CONNECTION" -> null
          - capacity_scaler              = 0 -> null
          - failover                     = false -> null
          - group                        = "https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03" -> null
          - max_connections              = 0 -> null
          - max_connections_per_endpoint = 0 -> null
          - max_connections_per_instance = 0 -> null
          - max_rate                     = 0 -> null
          - max_rate_per_endpoint        = 0 -> null
          - max_rate_per_instance        = 0 -> null
          - max_utilization              = 0 -> null
        }
      + backend {
          + balancing_mode = "CONNECTION"
          + failover       = false
          + group          = "https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-b/instanceGroups/s1-01"
        }
      + backend {
          + balancing_mode = "CONNECTION"
          + failover       = false
          + group          = "https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-c/instanceGroups/s1-02"
        }
    }

Plan: 0 to add, 1 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes


Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_region_backend_service.s1 to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.backend: planned set element
cty.ObjectVal(map[string]cty.Value{"balancing_mode":cty.StringVal("CONNECTION"),
"capacity_scaler":cty.NullVal(cty.Number),
"description":cty.NullVal(cty.String), "failover":cty.False,
"group":cty.StringVal("https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-b/instanceGroups/s1-01"),
"max_connections":cty.NullVal(cty.Number),
"max_connections_per_endpoint":cty.NullVal(cty.Number),
"max_connections_per_instance":cty.NullVal(cty.Number),
"max_rate":cty.NullVal(cty.Number),
"max_rate_per_endpoint":cty.NullVal(cty.Number),
"max_rate_per_instance":cty.NullVal(cty.Number),
"max_utilization":cty.NullVal(cty.Number)}) does not correlate with any
element in actual.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_region_backend_service.s1 to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.backend: planned set element
cty.ObjectVal(map[string]cty.Value{"balancing_mode":cty.StringVal("CONNECTION"),
"capacity_scaler":cty.NullVal(cty.Number),
"description":cty.NullVal(cty.String), "failover":cty.False,
"group":cty.StringVal("https://www.googleapis.com/compute/v1/projects/<project-id>/zones/europe-west1-c/instanceGroups/s1-02"),
"max_connections":cty.NullVal(cty.Number),
"max_connections_per_endpoint":cty.NullVal(cty.Number),
"max_connections_per_instance":cty.NullVal(cty.Number),
"max_rate":cty.NullVal(cty.Number),
"max_rate_per_endpoint":cty.NullVal(cty.Number),
"max_rate_per_instance":cty.NullVal(cty.Number),
"max_utilization":cty.NullVal(cty.Number)}) does not correlate with any
element in actual.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_region_backend_service.s1 to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.backend: block set length changed from 2 to 3.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

@ghost ghost removed waiting-response An issue/pull request is waiting for a response from the community labels Aug 21, 2020
@alisdair
Copy link
Contributor

This has now been reopened upstream, and I therefore believe it is a provider bug, so I'm going to mark the core issue closed. If you believe that is a mistake, please let me know!

@sivaramche
Copy link

sivaramche commented Sep 18, 2020 via email

@ghost
Copy link

ghost commented Oct 19, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug core v0.12 Issues (primarily bugs) reported against v0.12 releases
Projects
None yet
Development

No branches or pull requests

7 participants