plugin.terraform-provider-google_v2.11.0_x4: panic: runtime error: invalid memory address or nil pointer dereference #5018

hallvors · 2019-11-28T21:56:45Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.12.16

provider.google v2.11.0
provider.google-beta v3.0.0-beta.1

Affected Resource(s)

google_v2.11.0_x4

Terraform Configuration Files

It appears to have something to do with this module - if my root module is no longer referencing this one, the crash goes away. Creating two managed instance groups with the same template should be fine, right?:

resource "google_compute_instance_template" "default" {
  name_prefix = "${var.project_appname}-${var.target_environment}-instance-"
  description = "This template is used to create app server instances in a managed instance group. Managed by Terraform."

  tags = ["ssl", "http"]
  labels = {
    environment = var.target_environment
  }

  instance_description = "${var.project_appname}-${var.target_environment} instance. Managed by Terraform."
  machine_type         = "n1-standard-1"
  project              = var.google_project_name
  region               = var.google_region

  scheduling {
    automatic_restart   = true
    on_host_maintenance = "MIGRATE"
  }

  // Create a new boot disk from an image
  disk {
    source_image = var.img_link
    auto_delete  = true
    boot         = true
  }

  network_interface {
    network = "default"
    access_config {}
  }
  # TODO: not sure if these env vars are useful
  metadata_startup_script = "export APP=${var.project_appname}\nexport REPO=${var.project_repository}"
}

resource "google_compute_instance_group_manager" "webservers_backend" {
  provider    = google-beta
  name        = "${var.project_appname}-${var.target_environment}-backend"
  description = "Instance group, backend servers. Managed by Terraform."

  base_instance_name = "${var.project_appname}-${var.target_environment}-backend"
  zone               = var.google_zone

  version {
    name              = "app_instance_group"
    instance_template = google_compute_instance_template.default.self_link
  }


  target_size = 1

  named_port {
    name = "http"
    port = "8080"
  }


  auto_healing_policies {
    health_check      = google_compute_health_check.autohealing.self_link
    initial_delay_sec = 300
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_compute_instance_group_manager" "webservers_frontend" {
  provider    = google-beta
  name        = "${var.project_appname}-${var.target_environment}-frontend"
  description = "Instance group, frontend servers. Managed by Terraform."

  base_instance_name = "${var.project_appname}-${var.target_environment}-frontend"
  zone               = var.google_zone

  version {
    name              = "app_instance_group"
    instance_template = google_compute_instance_template.default.self_link
  }

  named_port {
    name = "http"
    port = "8080"
  }

  auto_healing_policies {
    health_check      = google_compute_health_check.autohealing.self_link
    initial_delay_sec = 300
  }

  lifecycle {
    create_before_destroy = true
  }
}

# some infrastructure-y things: health check, autoscaler

resource "google_compute_health_check" "autohealing" {
  provider            = google-beta
  name                = "${var.project_appname}-${var.target_environment}-autohealing-health-check"
  check_interval_sec  = 15
  timeout_sec         = 10
  healthy_threshold   = 2
  unhealthy_threshold = 10 # 50 seconds

  http_health_check {
    request_path = "/gcp_healtcheck"
    port         = "8080"
  }
}

resource "google_compute_autoscaler" "default" {
  provider = google-beta

  name   = "${var.project_appname}-${var.target_environment}-frontend-autoscaler"
  zone   = var.google_zone
  target = google_compute_instance_group_manager.webservers_frontend.self_link

  autoscaling_policy {
    max_replicas    = 5
    min_replicas    = 1
    cooldown_period = 60
  }
}

Debug Output

Console output: https://gist.github.com/hallvors/c41d3ca7bcc19fd1090f993ae25ee01a

Panic Output

https://gist.github.com/hallvors/95553bc0ac2cca81eae2f03f88d25262

Expected Behavior

No crash, completing setting up resources

Actual Behavior

Crashes consistently on apply and plan when the job is nearly finished. Deleting the state (and generated resources) makes plan work again AFAIK.

Steps to Reproduce

I mostly just run

terraform apply

with lots of -var statements. Backend is remote on GCP.

Important Factoids

The config uses one plain google provider, one google-beta and one google with an alias (to use a different service account auth file).

References

None

The text was updated successfully, but these errors were encountered:

hallvors · 2019-11-29T17:58:56Z

Maybe related to the state JSON trying to list the "instances" - and Google Compute Engine basically starting and stopping them at will? So when Terraform runs and tries to compare its state with reality there will likely be some inconsistencies?

plan and apply crash. destroy works and the next plan/apply will not crash and appear to succeed. However, it soon starts crashing again. (FWIW, I'm deploying an app that's work-in-progress and crashes a lot, so GCE is going to consider the instances unhealthy a lot of the time..)

hallvors · 2019-11-29T18:06:03Z

I hacked the Terraform state and emptied the instances: [ ... ] arrays of all google_compute_instance_template entries. This prevented crashing, so I think I'm making some useful assumptions here..

Chupaka · 2019-11-29T22:33:36Z

Can you check with more recent version of the provider? 2.11 is quite old.

hallvors · 2019-11-30T23:33:15Z

Hi @Chupaka - thanks. I'm a newbie, so I expected terraform init -upgrade to get the latest version(s) of things but thanks to your comment I noticed that some copy-pasted code indeed pins the Google provider version to 2.11. I'll have to run some more deploys to be certain the issue is gone, but thanks a lot for the response. Will follow up after more testing.

hallvors · 2019-12-09T15:34:08Z

I tried with 3.1.0 now and it seems to still crash at the "Refreshing state..." step for google_compute_instance_template. Not sure if it is exactly the same problem but seems very similar:

module.slipway-servers.google_compute_instance_template.default: Refreshing state... [id=capua-staging-instance-20191209141641307900000001]

Error: rpc error: code = Unavailable desc = transport is closing

panic: runtime error: invalid memory address or nil pointer dereference
2019-12-09T16:19:51.044+0100 [DEBUG] plugin.terraform-provider-google_v3.1.0_x5: [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xd56d47]
2019-12-09T16:19:51.044+0100 [DEBUG] plugin.terraform-provider-google_v3.1.0_x5:
2019-12-09T16:19:51.044+0100 [DEBUG] plugin.terraform-provider-google_v3.1.0_x5: goroutine 71 [running]:
2019-12-09T16:19:51.044+0100 [DEBUG] plugin.terraform-provider-google_v3.1.0_x5: github.com/hashicorp/terraform-plugin-sdk/internal/helper/

etc.

edwardmedia · 2020-01-07T18:36:40Z

@hallvors Can you post your latest config file that you tested with updated version 3.1.0? The provider was google-beta. If you still see crash, please post the both logs again. Thanks

hallvors · 2020-01-23T13:07:23Z

@edwardmedia can I share a demo project with you so you can try to reproduce?

edwardmedia · 2020-01-23T15:26:04Z

@hallvors yes please. Please let me know how I can repro this.

hallvors · 2020-01-24T13:37:45Z

@edwardmedia I have attempted to create a demo and zipped it up here:
https://drive.google.com/file/d/1cuZkPmsWRjYxK-aZUyzMtXtK70dhllL4/view?usp=sharing

I have tried to remove secrets and set up a dedicated GCP project with a separate service account. Hopefully I have not published anything risky, if you spot anything let me know. Also kindly let me know when you have downloaded the zip so I can remove it just in case.

(I suppose this zip now also contains some local terraform state and such.)

I tried first to write a demo from scratch to keep it as simple as possible, but this approach failed to reproduce. So this is probably a bit more complex than it needs to be, but I've commented out some Ansible stuff and things that are probably not relevant. As I'm relatively new to both GCP and Terraform the approaches may not make sense :) but I hope you'll be able to reproduce the crash.

Here's how to reproduce:

cd google-terraform-provider-crash/slipway-test-master
chmod +x ./slipway/init.sh
./slipway/init.sh

Usually Terraform crashes on second run of init.sh

hallvors · 2020-01-24T13:48:01Z

(I should also ofc remove that project when you're done since Google is probably charging for the test VMs :) )

edwardmedia · 2020-01-24T22:47:40Z

@hallvors get the zip file. Please remove it

c2thorn · 2020-02-07T19:14:22Z

Hi @hallvors . Thanks for the effort in creating a demo for us to reproduce. However, would you mind uploading your demo into a github repository or something a bit more transparent? Also, the simpler the demo, the easier it will be for us to pinpoint the root issue. Is there a way we can repro with a terraform apply instead of a shell script?

On first glance, it looks like an issue outside of our provider (panic output shows a nil pointer on this line), but it will be hard to prove either way without being able to easily repro.

hallvors · 2020-02-08T14:02:37Z

Hi @c2thorn, thanks for following up. As far as I remember I set up the demo to actually create infrastructure in a (test) project on my employer's Google account, and I'm a bit worried about abuse if it's kept online. I can share the file if you email hallvord at minus dot no. The shell script runs a few Terraform init and apply commands. If you add set -x or whatever it is at the start you will see them, and it is possible to repro by just re-running the last one once it is set up. Sorry about the complexity, I tried writing a very minimal demo but did not get to a point where it reproduced the problem.

Finally, I asked Terraform devs first and they sent me here :)

c2thorn · 2020-02-13T19:07:16Z

Hey @hallvors, sorry for the delay. In order to help resolve your issue, let's figure out a way to get your configuration in a more transparent manner. Posting a terraform config will not allow other users access to your employer's infrastructure. If you remove the project/orgId, there shouldn't be any chance for malicious intent. If you are still concerned, feel free to join our slack channel and we can speak details/configurations over direct messages.

hallvors · 2020-02-14T22:20:59Z

OK @c2thorn - please clone this:
https://github.com/hallvors/google-terraform-provider-crash

Running terraform apply in slipway-test-master/slipway/terraform/rollout/ tends to cause the crash. But of course you need a bit of state first, and I call this from a shell script setting plenty of required variables, some of them output from terraform commands in other subfolders.. so the actual command is somewhat like this (lightly censored):

terraform apply -auto-approve -var service_account_file=/home/hallvord/repos/capua/config/local-secrets/<censored>-projects-38a4a139c534.json -var project_appname=capua -var google_project_name=<censored> -var service_account_file_dns=/home/hallvord/repos/capua/slipway/config/local-secrets/infrastructure-dns-key.json -var google_dns_project_name=minfrastructure -var google_dns_zone=test-no -var top_level_domain= -var public_server_name=example.no -var internal_server_name=example.no -var admin_server_name=admin.example.no -var google_region=europe-north1 -var google_zone=europe-north1-a -var project_repository=git@github.com:test/example.git -var branch=master -var update_disk_link=https://www.googleapis.com/compute/v1/projects/<censored>/zones/europe-north1-a/disks/capua-production-updatevm-disk -var img_name=capua-master-20200214-2135

c2thorn · 2020-02-20T08:31:20Z

Thanks for being so cooperative @hallvors ! Just wanted to provide an update: I've been able to repro the crash by while limiting the scope down to just the servers and diskimage modules (servers didn't seem to do it by itself). While a bit busy at the moment, I should be able to look for the root cause in the next couple of days. Thanks for your patience.

c2thorn · 2020-02-27T00:43:49Z

Another update here @hallvors. This is definitely a bug, but there may be a workaround before the fix is finished.

The relevant parts of the crash you are facing have to do with your google_compute_instance_template referencing your google_compute_image self_link in source_image. What I believe is happening is your google_compute_image is being created from the first time your script calls terraform apply, but then is planned for recreation on the second terraform apply. This creates a situation where the google_compute_instance_template source_image reference is planned to change to a value it will not know until after apply is finished.

Normally, Terraform would handle this situation well, but unfortunately there is some custom code written for google_compute_instance_template that does not handle this edge case resulting in the crash.

While this most certainly should be fixed, I don't think you are actually intending to first create your google_compute_image resource and then modify it later causing it to be destroyed/recreated. If you modify your setup to create the image resource once in its final state, I think that will prevent you from seeing the crash.

hallvors · 2020-03-03T21:05:31Z

Thanks a lot for your work on this, @c2thorn ! 🙇‍♂️

ghost · 2020-04-02T13:50:07Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

ghost added the bug label Nov 28, 2019

edwardmedia self-assigned this Jan 7, 2020

edwardmedia added the waiting-response label Jan 7, 2020

ghost removed the waiting-response label Jan 23, 2020

edwardmedia added the waiting-response label Jan 23, 2020

ghost removed the waiting-response label Jan 24, 2020

edwardmedia assigned c2thorn Feb 4, 2020

c2thorn added the waiting-response label Feb 7, 2020

ghost removed the waiting-response label Feb 8, 2020

c2thorn added the waiting-response label Feb 13, 2020

ghost removed the waiting-response label Feb 14, 2020

edwardmedia removed their assignment Feb 27, 2020

c2thorn mentioned this issue Mar 2, 2020

Fix google_compute_instance_template crash GoogleCloudPlatform/magic-modules#3194

Merged

c2thorn closed this as completed in GoogleCloudPlatform/magic-modules#3194 Mar 2, 2020

This was referenced Mar 2, 2020

Fix google_compute_instance_template crash #5808

Merged

Fix google_compute_instance_template crash hashicorp/terraform-provider-google-beta#1812

Merged

ghost locked and limited conversation to collaborators Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin.terraform-provider-google_v2.11.0_x4: panic: runtime error: invalid memory address or nil pointer dereference #5018

plugin.terraform-provider-google_v2.11.0_x4: panic: runtime error: invalid memory address or nil pointer dereference #5018

hallvors commented Nov 28, 2019 •

edited

Loading

hallvors commented Nov 29, 2019

hallvors commented Nov 29, 2019

Chupaka commented Nov 29, 2019

hallvors commented Nov 30, 2019

hallvors commented Dec 9, 2019

edwardmedia commented Jan 7, 2020 •

edited

Loading

hallvors commented Jan 23, 2020

edwardmedia commented Jan 23, 2020

hallvors commented Jan 24, 2020 •

edited

Loading

hallvors commented Jan 24, 2020

edwardmedia commented Jan 24, 2020 •

edited

Loading

c2thorn commented Feb 7, 2020

hallvors commented Feb 8, 2020

c2thorn commented Feb 13, 2020

hallvors commented Feb 14, 2020 •

edited

Loading

c2thorn commented Feb 20, 2020

c2thorn commented Feb 27, 2020 •

edited

Loading

hallvors commented Mar 3, 2020

ghost commented Apr 2, 2020

plugin.terraform-provider-google_v2.11.0_x4: panic: runtime error: invalid memory address or nil pointer dereference #5018

plugin.terraform-provider-google_v2.11.0_x4: panic: runtime error: invalid memory address or nil pointer dereference #5018

Comments

hallvors commented Nov 28, 2019 • edited Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

hallvors commented Nov 29, 2019

hallvors commented Nov 29, 2019

Chupaka commented Nov 29, 2019

hallvors commented Nov 30, 2019

hallvors commented Dec 9, 2019

edwardmedia commented Jan 7, 2020 • edited Loading

hallvors commented Jan 23, 2020

edwardmedia commented Jan 23, 2020

hallvors commented Jan 24, 2020 • edited Loading

hallvors commented Jan 24, 2020

edwardmedia commented Jan 24, 2020 • edited Loading

c2thorn commented Feb 7, 2020

hallvors commented Feb 8, 2020

c2thorn commented Feb 13, 2020

hallvors commented Feb 14, 2020 • edited Loading

c2thorn commented Feb 20, 2020

c2thorn commented Feb 27, 2020 • edited Loading

hallvors commented Mar 3, 2020

ghost commented Apr 2, 2020

hallvors commented Nov 28, 2019 •

edited

Loading

edwardmedia commented Jan 7, 2020 •

edited

Loading

hallvors commented Jan 24, 2020 •

edited

Loading

edwardmedia commented Jan 24, 2020 •

edited

Loading

hallvors commented Feb 14, 2020 •

edited

Loading

c2thorn commented Feb 27, 2020 •

edited

Loading