Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/replace resource when a dependency is changed #8099

Closed
OJFord opened this issue Aug 10, 2016 · 45 comments · Fixed by #30900
Closed

Update/replace resource when a dependency is changed #8099

OJFord opened this issue Aug 10, 2016 · 45 comments · Fixed by #30900

Comments

@OJFord
Copy link
Contributor

OJFord commented Aug 10, 2016

resource "foo" "bar" {
    foobar = "${file("foobar")}"
}

resource "bar" "foo" {
    depends_on = ["foo.bar"]
}

bar.foo is not modified if the file 'foobar' changed without otherwise changing the resource that includes it.

@radeksimko
Copy link
Member

Hi @OJFord
would you mind providing more concrete example with real resources that would help us reproduce the unexpected behaviour you described?

Thanks.

@radeksimko radeksimko added bug waiting-response An issue/pull request is waiting for a response from the community core labels Aug 11, 2016
@cemo
Copy link

cemo commented Oct 7, 2016

@radeksimko please check issue referenced #6613. This is pretty important and can be hit in other places as well. From my experiments, I observed that "depends on" is only related to order. It does not trigger a change.

@apparentlymart
Copy link
Contributor

Hi @OJFord and @cemo,

In Terraform's design, a dependency edge (which is what depends_on creates explicitly) is used only for ordering operations. So in the very theoretical example given in the issue summary, Terraform knows that when it's doing any operation that affects both foo.bar and bar.foo it will always do the operation to foo.bar first.

I think you are expecting an additional behavior: if there is an update to foo.bar then there will always be an automatic update to bar.foo. But that is not actually how Terraform works, by design: the dependency edges are used for ordering, but the direct attribute values are used for diffing.

So in practice this means that the bar.foo in the original example will only get an "update" diff if any of its own attributes are changed. To @radeksimko's point it's hard to give a good example without a real use-case, but the way this would be done is to interpolate some attribute of bar.foo into foo.bar such that an update diff will be created whenever that attribute changes. Note that it's always attribute-oriented... you need to interpolate the specific value that will be changing.

In practice this behavior does cause some trouble on edge cases, and those edge cases are what #4846 and #8769 are about: allowing Terraform to detect the side effects of a given update, such as the version_id on an Amazon S3 object implicitly changing each time its content is updated.

Regarding your connection to that other issue @cemo, you are right that the given issue is another one of these edge cases, though a slightly different one: taking an action (deploying) directly in response to another action (updating some other resource), rather than using attribute-based diffing... though for this API gateway case in particular, since API gateway encourages you to create a lot of resources, the specific syntax proposed there would likely be inconvenient/noisy.

Again as @radeksimko said a specific example from @OJFord might allow us to suggest a workaround for a specific case today, in spite of the core mechanisms I've described above. In several cases we have made special allowances in the design of a resource such that a use-case can be met, and we may be able to either suggest an already-existing one of these to use or design a new "allowance" if we have a specific example to work with. (@cemo's API gateway example is already noted, and there were already discussions about that which I will describe in more detail over there.)

@OJFord
Copy link
Contributor Author

OJFord commented Oct 9, 2016

I'm sorry that I never came back with an example; I'm afraid I can't remember exactly what I was doing - but:

I think you are expecting an additional behavior: if there is an update to foo.bar then there will always be an automatic update to bar.foo. But that is not actually how Terraform works, by design: the dependency edges are used for ordering, but the direct attribute values are used for diffing.

is exactly right, that was what I misunderstood.

Perhaps something like taint_on_dependency_change = true is possible? That is, if such a variable is true, change the semantic of "ordering" above from "do this after, if it needs to be done" to "do this after".

@cemo
Copy link

cemo commented Oct 10, 2016

@OJFord the issue you don't remember might be #6613.

I second @OJFord's proposition and expect something like a simpler thing as taint_on_dependency_change. However I can not be considered an expert on terraform land and due to the fact that this is my first experiment with it my opinions might not weight enough.

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 10, 2016

This taint_on_dependency_change idea is an interesting one. I'm not sure I would actually implement it using the tainting mechanism, since that's more of a workflow management thing and indicates that the resource is "broken" in some way, but we could potentially think of it more like replace_on_dependency_change: artificially produce a "force new" diff any time a dependency changes.

I think this sort of thing would likely require some of the machinery from #6810 around detecting the presence of whole-resource diffs and correctly handling errors with them. There are some edge cases round what happens if B depends on A and A is changed but B encounters an error while replacing... since the intended change is not explicitly visible in the attributes, Terraform needs to make sure to do enough book-keeping that it knows it has more work to do when run again after the error is resolved.

It might work out conceptually simpler to generalize the triggers idea from null_resource or keepers from the random provider, so that it can be used on any resource:

resource "foo" "bar" {
    foobar = "${file("foobar")}"
}

resource "bar" "foo" {
    lifecycle {
        replace_on_change {
            foo_bar_foobar = "${foo.bar.foobar}"
        }
    }
}

In the above example, the lifecycle.replace_on_change attribute acts as if it were a resource attribute with "forces new resource" set on it: the arbitrary members of this map are stored in the state, and on each run Terraform will diff what's in the state with what's in the config and generate a "replace" diff if any of them have changed.

This effectively gives you an extra place to represent explicit value dependencies that don't have an obvious home in the resource's own attributes.

This is conceptually simpler because it can build on existing mechanisms and UX to some extent. For example, it might look like this in a diff:

-/+ bar.foo
    lifecycle.replace_on_change.foo_bar_foobar: "old_value" => "new value" (forces new resource)

In the short term we're likely to continue addressing this by adding special extra ForceNew attributes to resources where such behavior is useful, so that this technique can be used in a less-generic way where it's most valuable. This was what I'd proposed over in #6613, and has the advantage that it can be implemented entirely within a provider without requiring any core changes, and so there's much less friction to get it done. Thus having additional concrete use-cases would be helpful, either to motivate the implementation of a generic feature like above or to prompt the implementation of resource-specific solutions where appropriate.


For the moment I'm going to re-tag this one as "thinking" to indicate that it's an interesting idea but we need to gather more data (real use-cases) in order to design it well. I'd encourage other folks to share concrete use-cases they have in this area as separate issues, similar to what's seen in #6613, and mention this issue by number so that it can become a collection of links to relevant use-cases that can inform further design.

@apparentlymart apparentlymart added thinking and removed waiting-response An issue/pull request is waiting for a response from the community labels Oct 10, 2016
@apparentlymart apparentlymart changed the title depends_on doesn't also depend on the dependency's file() inclusion Update/replace resource when a dependency is changed Oct 10, 2016
@cemo
Copy link

cemo commented Nov 16, 2016

@mitchellh This issue might be considered for 0.8 release as you improved "depends_on" and this might be a quick win.

@ckyoog
Copy link

ckyoog commented Jul 7, 2017

resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = "${var.max_capacity}"
  min_capacity       = "${var.min_capacity}"
  role_arn           = "${var.global_vars["ecs_as_arn"]}"

  resource_id        = "service/${var.global_vars["ecs_cluster_name"]}/${var.ecs_service_name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_cpu_scale_in" {
  adjustment_type         = "${var.adjustment_type}"
  cooldown                = "${var.cooldown}"
  metric_aggregation_type = "${var.metric_aggregation_type}"

  name                    = "${var.global_vars["ecs_cluster_name"]}-${var.ecs_service_name}-cpu-scale-in"
  resource_id             = "service/${var.global_vars["ecs_cluster_name"]}/${var.ecs_service_name}"
  scalable_dimension      = "ecs:service:DesiredCount"
  service_namespace       = "ecs"

  step_adjustment {
    metric_interval_upper_bound = "${var.scale_in_cpu_upper_bound}"
    scaling_adjustment          = "${var.scale_in_adjustment}"
  }

  depends_on = ["aws_appautoscaling_target.ecs_target"]
}

Hi @apparentlymart,

Here is another real user-case of my own.

resource aws_appautoscaling_policy.ecs_cpu_scale_in (let it be autoscaling policy) depends on resource aws_appautoscaling_target.ecs_target(let it be autoscaling target).

When I change the value of max_capacity, and then run terraform plan, it shows the autoscaling target is forced to new (it is going to be destroyed and re-added). But nothing will happen to autoscaling policy, which is supposed to be destroyed and re-added as well.

Why is it supposed to? Because in my practice, after terraform applysuccessfully (which destroys and re-adds the autoscaling target successfully), the autoscaling policy is gone automatically (if you login to aws console, you can see it's gone), so I have to run terraform apply again, the second time, and this time, it will add the autoscaling policy back.

(BTW. the both resources are actually defined in a module. maybe it matters, or not, I'm not sure).

@apparentlymart
Copy link
Contributor

Hi @ckyoog! Thanks for sharing that.

What you described there sounds like what's captured in hashicorp/terraform-provider-aws#240. If you think it's the same thing, it would be cool if you could post the same details in that issue since having a full reproduction case is very useful. I think in your particular case this is a bug that we ought to fix in the AWS provider, though you're right that if the feature I described in my earlier comment were implemented it could in principle be used as a workaround.

In the mean time, you might be able to already workaround this by including an additional interpolation in your policy name to force it to get recreated when the target is recreated:

  name = "${var.global_vars["ecs_cluster_name"]}-${var.ecs_service_name}-cpu-scale-in-${aws_appautoscaling_target.ecs_target.id}"

Since the name attribute forces new resource, this should cause the policy to get recreated each time the target is recreated.

@ckyoog
Copy link

ckyoog commented Jul 8, 2017

Thank you @apparentlymart for the workaround. Sure, I will post my case to issue terraform-provider/terraform-provider-aws#240.

@zopanix
Copy link
Contributor

zopanix commented Oct 10, 2017

Hey, just got an idea of how this might be solutionned, now the approach is inspired fron Google Cloud and I don't know if it will apply to all use cases.
Basically, in google cloud you have the notion of used by and uses on resources. For example, the link between a boot_disk and an instance. The boot_disk can exist alone as a simple disk but the instance cannot exist without a boot disk. Therefore, in the data model, you can have a generic system that states, used_by.

Example:

resource "google_compute_disk" "bastion_boot" = {
  image = "centos-7"
  size    = "10"
  used_by = ["${google_compute_instance.bastion.name}"]
}

resource "google_compute_instance" "bastion" = {
  boot_disk = {
    source = "${google_compute_disk.bastion_boot.name}"
  }
  uses = ["${google_compute_disk.bastion_boot.name}"]
}

The uses and used_by could be implicitly set in well known cases but could be explicitly set in some user and/or corner cases. And it would become the provider's responsibility to know about the implicit uses and as a workaround, it would be possible to use the explicit form.

It would work like the implicit and explicit depends_on except.

Now I understand that there are some subtle differences in the problems that have been mentioned like, I don't want to destroy, I want to update a resource for example. I don't know how my case would fit into this.

Also, I think it would be best to stick with the cloud provider's semantics, and, in my case, it really reflects what I'm doing and how everything works. This system would be a reverse depends on creating a possible destruction cycle that would be triggered before the create cycle. Which would be fine in most cases, and if you cannot tolerate a destruction, you usually apply a blue-green model anyways which doesn't give such a pain. But in my case, during my maintenance windows, I can be destructive on most of my resources.

Just some related issues:
#16065 #16200

@alethenorio
Copy link

I have run into the need for this issue myself.

The use case is the following:

I have a resource for a database instance (In this case an AWS RDS instance) which performs a snapshot of its disk upon destruction. If I destroy this resource and recreate it and destroy it again, AWS returns an error because it will attempt to create a snapshot with the same identifier as before.

This can be mitigated by using something like the "random_id" resource as a suffix/prefix to that identifier. The issue is that if I taint the database resource, I need to manually remember to taint the "random_id" resource as well otherwise the new instance will have the same "random_id" as before.

Attempting to use a "keepers" pointing to the database resource id does not work because it causes a cyclic dependency.

Any ideas on how one handles that?

@hogarthj
Copy link

hogarthj commented Oct 15, 2019

If one needs to recreate an aws_lb_target_group that is currently the target of an aws_lb_listener_rule, the aws_lb_listener_rule needs to first be destroyed before the aws_lb_target_group can be recreated.

That's similar to what I'm bumping into and trying to work around right now ... trying to evaluate a solution and a "force_recreate/taint" in lifecycle, or similar, would be incredibly useful right now ...

In my case I have a target group that needs to be recreated, but the listener (no rule involved here) is only getting a "update in place" change ... but then the target group cannot be destroyed because the listener isn't being destroyed ...

For reference for others searching the issue for this in the AWS provider is being tracked in hashicorp/terraform-provider-aws#10233

@psanzm
Copy link

psanzm commented Nov 21, 2019

I was running into the same issue with the Google Provider and the resource google_compute_resource_policy & google_compute_disk_resource_policy_attachment.

When you create a policy for scheduling the snapshots of a GCE Disk you must attach the policy to the disk. That policy isn't editable so if you perform any changes Terraform has to recreate the resource but doesn't recreate the attachment resource, even if it's "linked" with the depends_on directive of Terraform.

Example of the resources:

resource "google_compute_resource_policy" "snapshot_schedule_wds" {
  name    = "snapshot-weekly-schedule-wds"
  region  = var.subnetwork_region
  project = google_project.mm-sap-prod.name

  snapshot_schedule_policy {
    schedule {
      weekly_schedule {
        day_of_weeks {
          day        = "SATURDAY"
          start_time = "20:00"
        }
      }
    }
    retention_policy {
      max_retention_days    = 366
      on_source_disk_delete = "KEEP_AUTO_SNAPSHOTS"
    }
    snapshot_properties {
      labels = {
        app     = "xxx"
      }
      storage_locations = ["europe-west6"]
      guest_flush       = false
    }
  }
}

resource "google_compute_disk_resource_policy_attachment" "gcp_wds_snap_schedule_pd_boot" {
  name = google_compute_resource_policy.snapshot_schedule_wds.name
  disk = google_compute_disk.web-dispatch-boot.name
  zone = var.zone
  project = google_project.mm-sap-prod.name

  depends_on = ["google_compute_resource_policy.snapshot_schedule_wds"]
}

Terraform version

Terraform v0.12.13
+ provider.external v1.2.0
+ provider.google v2.20.0
+ provider.google-beta v2.20.0

Any solution for this use case?

@pdecat
Copy link
Contributor

pdecat commented Nov 27, 2019

@psanzm in this very specific use case, using the google_compute_resource_policy's id field, instead of name, in the google_compute_disk_resource_policy_attachment's name field allows to it work:

resource "google_compute_disk_resource_policy_attachment" "gcp_wds_snap_schedule_pd_boot" {
  name = google_compute_resource_policy.snapshot_schedule_wds.id
...

Note: it works because the actual values of name and id are the same, but the id is unknown upon recreation.

@sean-nixon
Copy link

To add another example use case I recently ran into with Azure PostgreSQL. I wanted to upgrade the version of the PostgreSQL engine on the server, which requires replacement. The dependent resources such as firewall rules and Postgres configurations were not re-created. I had to run through two applies. This is a common occurrence in Azure where most IDs are based on the name of the resource, so if it is re-created the ID stays the same and dependent resources don't register the change.

resource "azurerm_postgresql_server" "pgsql_server" {
  name                = "examplepgsql"
  resource_group_name = "my-rg"
  location            = "eastus"

  sku {
    name     = "GP_Gen5_2"
    capacity = "2"
    tier     = "GeneralPurpose"
    family   = "Gen5"
  }

  storage_profile {
    storage_mb            = "51200"
    backup_retention_days = 35
    geo_redundant_backup  = "Enabled"
  }

  administrator_login          = var.admin_username
  administrator_login_password = var.admin_password
  version                      = "11"
  ssl_enforcement              = "Enabled"
}

resource "azurerm_postgresql_firewall_rule" "azure_services_firewall_rule" {
  name                = "AzureServices"
  resource_group_name = azurerm_postgresql_server.pgsql_server.resource_group_name
  server_name         = azurerm_postgresql_server.pgsql_server.name
  start_ip_address    = "0.0.0.0"
  end_ip_address      = "0.0.0.0"
}

resource "azurerm_postgresql_configuration" "log_checkpoints_pgsql_config" {
  name                = "log_checkpoints"
  resource_group_name = azurerm_postgresql_server.pgsql_server.resource_group_name
  server_name         = azurerm_postgresql_server.pgsql_server.name
  value               = "on"
}

@awilkins
Copy link

awilkins commented Mar 4, 2020

Another use case :

I wanted to update an SSM parameter with the value of a AMI data block, but only when it changes.

This is for use with an Automation workflow like the example posted in the AWS docs.

My thought was : put in a null_resource that triggers when the AMI ID changes, and make the SSM parameter depend on this, but all null_resource emits is an ID.

Aha, I thought, I'll do this :

data "aws_ami" "windows" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["Windows_Server-2012-R2_RTM-English-64Bit-Base-*"]
  }
}

resource "null_resource" "new_windows_ami" {
  triggers = {
    base_ami_date = data.aws_ami.windows.creation_date
    force_update  = 1
  }
}

resource "aws_ssm_parameter" "current_windows_ami" {
  name  = "/ami/windows/2k12/current"
  value = data.aws_ami.windows.image_id
  type  = "String"

  tags = {
    BaseAmiTriggerId = null_resource.new_windows_ami.id
  }

  depends_on = [
    null_resource.new_windows_ami,
  ]
  # We only want the initial value from the data, we're going to replace this
  # parameter with the current "patched" release until there's a new base AMI
  overwrite = true
  lifecycle {
    ignore_changes = [
      value,
    ]
  }
}

... sadly ignore_change also implies block changes. What I was hoping was that the change to the tag would be enough to trigger an update of the whole resource. ignore_changes means that changes to the inputs of the attributes are ignored for all purposes, not just whether they trigger a lifecycle update.

This seems a shame because otherwise you could implement quite sophisticated lifecycle management with the null resource, concocting triggers with interpolations and such and only triggering an update to a dependent resource when the ID changed as a result.

@MarkKharitonov
Copy link

I came to this thread from hashicorp/terraform-provider-azurerm#763. I do not know how this is connected, but that issue was closed for the sake of hashicorp/terraform-provider-azurerm#326, which in turn was closed for the sake of this one.

So, if you guys understand how the connection was made, then here is another scenario and very real. We modify probing path on an azure traffic manager and boom - its endpoints are gone. This is very frustrating. Is there an ETA on the fix for this issue?

@OJFord
Copy link
Contributor Author

OJFord commented Mar 27, 2020

@MarkKharitonov This issue is essentially a feature request, what you're describing with Azure sounds like a bug though (but I haven't used Azure or read through those issues) - so perhaps the link is 'sorry, nothing we can do without [this issue resolved], closing'.

I phrased it as a bug in the OP (and I should perhaps edit that) out of misunderstanding, but it's really a request for a form of dependency control that isn't possible (solely) with terraform today.

@MarkKharitonov
Copy link

I do not understand. I have a traffic manager resource. The change does not recreate the resource - it is reported as an in-place replacement. Yet it blows away the endpoints. How come it is a feature request?

@OJFord
Copy link
Contributor Author

OJFord commented Apr 20, 2020

@MarkKharitonov As I said, "what you're describing with Azure sounds like a bug", but this issue is a feature request, for something that does not exist in terraform core today.

Possibly the Azure resolution was 'nothing we can do without a way of doing [what is described here]' - I have no idea - but this issue itself isn't a bug, and is labelled 'thinking'. There's no guarantee there'll ever be a way of doing this, nevermind an ETA.

(I don't work for Hashicorp, I just opened this issue, there could be firmer internal plans for all I know, just trying to help.)

@MarkKharitonov
Copy link

I do not know what to do. There are real issues in the provider that are being closed claiming the issues is because of this one. But this one is something apparently huge in scope. So, I do not understand what am I supposed to do. Should I open yet another issue in terraform providers making reference to the already closed ones and to this one? How do we attract attention to the real bug without it being closed for nothing, which has already happened twice?

@sean-nixon
Copy link

@MarkKharitonov I'm not expert on Terraform or Terraform provider development so someone else please correct me if I'm wrong but I don't think there's anything that can be done in the provider. The issues in the Azure provider are caused by a limitation of Terraform, not a bug in the AzureRM provider that can be fixed. Based on the comments in this issue, there is a fundamental challenge with how the Azure API works and how Terraform handles dependencies. Azure's API does not have unique IDs for resources. So if you have a child resource that references a parent resource by ID, even if that parent resource is re-created the ID doesn't change. From Terraform's perspective, that means that no attribute was changed on the child-resource since the ID it's referencing is the same, even though in actuality the child resource was also destroyed together with the parent resource. The feature request here, as I understand it, is to add additional intelligence to Terraform dependencies to use them not just for ordering resource creation, but also to detect that a dependency (e.g. parent resource) was destroyed/re-created and trigger a destroy/re-create on the dependent resource (e.g. child resource), irrespective of if any attributes on the child resource have changed.

@steph-moto
Copy link

This issue appears really critical and not a feature request at all. The fundamental core of terraform is to make sure to apply any missing changes if required. In this case not having terraform create dependency on a parent resource recreation is fundamentally an issue.

Could someone clarify if the authorization rules not being created when the event hub associated with is re-created has been present a long time ago. Is there any previous version of azureRM or Terraform that would mitigate the issue until this gets resolved?

Because the only approach that I can see to work around this issue is to invoke twice terraform deployment which to me is a non sense.

@Backscratcher
Copy link

Hey !
I have another example of this behaviour. Changes done to modules that force recreation of resources inside the module, used by dashboard, won't update and it will result in dashboard referencing configuration from before-apply. Another apply will actually pickup those changes and alter the dashboard_json template. Weird thing is that changes done to aws_instance.cron will be picked up at the time of the first apply but changes to module will not.

data "template_file" "dashboard_json" {
  template = file("${path.module}/templates/cloudwatch_dashboard/dashboard.tpl")
  vars = {
    rds_instance_id                      = module.database.rds_instance_id
    region                               = var.aws_region
    asg_normal_name                      = module.autoscaling_group.aws_autoscaling_group_name-normal
    cron_instance_id                     = aws_instance.cron.id
    lb_arn_suffix                        = module.load_balancer.aws_lb_arn_suffix
    lb_target_group_arn_suffix           = module.load_balancer.aws_lb_target_group_target_group_arn_suffix
    lb_blackhole_target_group_arn_suffix = module.load_balancer.aws_lb_target_group_target_group_blackhole_arn_suffix
    lb_redash_target_group_arn_suffix    = aws_lb_target_group.redash.arn_suffix
    procstats_cpu                        = (length(var.cron_procstats[local.environment]) > 0) ? data.template_file.dashboard_procstats_cpu.rendered : ""
    procstats_mem                        = (length(var.cron_procstats[local.environment]) > 0) ? data.template_file.dashboard_procstats_mem.rendered : ""
    # force recreation of the dashboard due to weird behaviour when changes to modules above
    # are not picked up by terraform and dashboard is not being updated
    force_recreation = var.force_dashboard_recreation[local.environment] ? "${timestamp()}" : ""
  }
}

resource "aws_cloudwatch_dashboard" "main" {
  dashboard_name = "${var.project_name}-${local.environment}-dashboard"
  dashboard_body = data.template_file.dashboard_json.rendered
}

I tried using depends_on - maybe the ordering would help with it - but it didn't help I end up using timestamp to force recreation.

@kustodian
Copy link

We have the exact same problem on GCP which is described in details in this issue hashicorp/terraform-provider-google#6376.

Here is part of the relevant config:

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  dynamic "backend" {
    for_each = google_compute_instance_group.s1
    content {
      group = backend.value.self_link
    }
  }
  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link
}

I'm not sure is this a general TF problem or a Google provider problem, but here it goes.
Currently it's not possible to lover the number of google_compute_instance_group that are used in a google_compute_region_backend_service. In the code above if we lower the number of google_compute_instance_group resources and try to apply the configuration, TF will first try to delete the not needed instance groups and then update the backend configuration, but that order doesn't work because you cannot delete an instance group that is used by the backend service, the order should be the other way around.

So to sum it up, when I lower the number of the instance group resources TF does this:

  1. delete surplus google_compute_instance_group -> this fails
  2. update google_compute_region_backend_service

It should do this the other way around:

  1. update google_compute_region_backend_service
  2. delete surplus google_compute_instance_group -> this fails

What I don't understand is why doesn't TF know that it should do the update first, then remove instance groups? When I run destroy, TF does it correctly: first destroys the backend service, then instance groups.

Also this is very hard to fix, because you need to make a temp config change, apply, then set the final config you want and again apply.

@lorengordon
Copy link
Contributor

@kustodian Can you use create_before_destroy in google_compute_instance_group?

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link

  lifecycle {
    create_before_destroy = true
  }
}

@kustodian
Copy link

@lorengordon I can, but it doesn't help. TF works exactly the same im my example with or without create_before_destroy = true.

To be honest I'm not entirely sure that my issue is the same thing as what the issue reporter is describing.

@OJFord
Copy link
Contributor Author

OJFord commented May 26, 2020

@apparentlymart May I suggest locking this issue? I suspect you and the team probably have enough examples and use cases to consider this feature now?

I could 'unsubscribe' of course, it's just that I would like to be notified if/when there's a decision, some progress, or something to help test. Cheers. 🙂

@brandocorp
Copy link

brandocorp commented May 27, 2020

Edit: It turns out this is really a function of kubernetes, and not really a terraform concern.

Just adding my 0.02. This is also an issue with the kubernetes provider and secrets/config maps. A service using an updated config map or secret doesn't detect the change because the underlying pods of the service need to be restarted or recreated to detect the changes.

resource "kubernetes_secret" "value" {
  metadata {
    name      = "k8s-secret-value"
    namespace = "private"
  }

  data {
    secret = var.secret_value
  }
}

resource "kubernetes_deployment" "service" {
  metadata {
    name      =  "internal-service"
    namespace = "private"
  }
  spec {


    template {


      spec {
        container {


          env {
            name = "SECRET_VALUE"

            value_from {
              secret_key_ref {
                name = kubernetes_secret.value.metadata.0.name
                key  = "secret"
              }
            }
          }
        }
      }
    }
  }
}

If the value for the secret key is updated, nothing seems to happen with the deployment.

@hashicorp hashicorp locked as off-topic and limited conversation to collaborators May 29, 2020
@danieldreier
Copy link
Contributor

I'm going to lock this issue for the time being, because the remaining discussion seems largely to be people supporting each other in workarounds.

I’m happy to see people are helping each other work around this, and I've created a thread for this on the community forum so that people can continue these discussions without creating excess noise for people who just want succinct updates in GitHub.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.