Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform crash on plan/apply/destroy operation on google-beta container cluster #4756

Closed
chrissng opened this issue Oct 25, 2019 · 10 comments
Closed
Labels
bug crash forward/review In review; remove label to forward service/container

Comments

@chrissng
Copy link
Contributor

chrissng commented Oct 25, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.12.11
+ provider.google v2.18.1
+ provider.google-beta v2.18.1
+ provider.kubernetes v1.7.1
+ provider.null v2.1.2
+ provider.random v2.2.1

Affected Resource(s)

  • google_container_cluster

Terraform Configuration Files

I am using the terraform-google-kubernetes-engine tf module:

module "gke" {
  source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  version = "~> 5.0"

  project_id  = var.project_id
  name        = var.cluster_name
  description = var.cluster_description

  region   = var.region
  regional = var.regional
  zones    = var.zones

  network            = var.network_name
  subnetwork         = var.subnetwork
  ip_range_pods      = var.ip_range_pods
  ip_range_services  = var.ip_range_services
  network_project_id = var.project_id

  ip_masq_link_local = "false"

  http_load_balancing        = false
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = false
  network_policy             = false

  kubernetes_version     = var.kubernetes_version
  maintenance_start_time = var.maintenance_start_time

  monitoring_service = var.monitoring_service
  logging_service    = var.logging_service

  enable_private_endpoint = false
  enable_private_nodes    = true
  master_ipv4_cidr_block  = var.master_ipv4_cidr_block

  master_authorized_networks_config = [
    {
      cidr_blocks = var.master_access_cidrs
    },
  ]

  # We do not need to create the default service account
  create_service_account = false
  service_account        = local.gke_service_account

  remove_default_node_pool = true

  node_pools = [
    {
      name               = var.default_node_pool_name
      machine_type       = var.default_node_pool_machine_type
      min_count          = var.default_node_pool_min_count
      max_count          = var.default_node_pool_max_count
      initial_node_count = var.default_node_pool_min_count
      auto_repair        = true
      auto_upgrade       = true
      disk_size_gb       = var.default_node_pool_disk_size_gb
      disk_type          = var.default_node_pool_disk_type
      image_type         = "COS"
    },
  ]

  node_pools_labels = {
    all                             = local.all_node_pools_labels
    "${var.default_node_pool_name}" = var.default_node_pool_labels
  }

  node_pools_metadata = {
    all                             = local.all_node_pools_metadata
    "${var.default_node_pool_name}" = var.default_node_pool_metadata
  }

  node_pools_taints = {
    all                             = []
    "${var.default_node_pool_name}" = var.default_node_pool_taints
  }

  node_pools_tags = {
    all                             = local.all_node_pools_tags
    "${var.default_node_pool_name}" = var.default_node_pool_tags
  }

  node_pools_oauth_scopes = {
    all                             = ["https://www.googleapis.com/auth/cloud-platform"]
    "${var.default_node_pool_name}" = ["https://www.googleapis.com/auth/cloud-platform"]
  }

  identity_namespace = "${var.project_id}.svc.id.goog"

  node_metadata = "UNSPECIFIED"
}
# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://www.hashicorp.com/security
# If reproducing the bug involves modifying the config file (e.g., apply a config,
# change a value, apply the config again, see the bug) then please include both the
# version of the config before the change, and the version of the config after the change.

Debug Output

N/A

Panic Output

https://gist.github.com/chrissng/c01959daad9050de7339ab44bca03663#file-crash-log


Error: rpc error: code = Unavailable desc = transport is closing




!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

Terraform crashed! This is always indicative of a bug within Terraform.
A crash log has been placed at "crash.log" relative to your current
working directory. It would be immensely helpful if you could please
report the crash with Terraform[1] so that we can fix this.

When reporting bugs, please include your terraform version. That
information is available on the first line of crash.log. You can also
get it by running 'terraform --version' on the command line.

[1]: https://github.com/hashicorp/terraform/issues

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!
panic: runtime error: invalid memory address or nil pointer dereference
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1b88522]
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: goroutine 90 [running]:
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/terraform-providers/terraform-provider-google-beta/google-beta.resourceContainerClusterRead(0xc0000eeb60, 0x205eb80, 0xc0000cf600, 0xc0000eeb60, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/google-beta/resource_container_cluster.go:1198 +0xfe2
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc000205480, 0xc0004cc1e0, 0x205eb80, 0xc0000cf600, 0xc000abae10, 0x0, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/github.com/hashicorp/terraform-plugin-sdk@v1.0.0/helper/schema/resource.go:455 +0x119
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/hashicorp/terraform-plugin-sdk/internal/helper/plugin.(*GRPCProviderServer).ReadResource(0xc00000f0c0, 0x2849bc0, 0xc000a0e7b0, 0xc0000adf40, 0xc00000f0c0, 0xc000a0e7b0, 0xc000710a80)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/github.com/hashicorp/terraform-plugin-sdk@v1.0.0/internal/helper/plugin/grpc_provider.go:525 +0x3d8
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/hashicorp/terraform-plugin-sdk/internal/tfplugin5._Provider_ReadResource_Handler(0x23d3420, 0xc00000f0c0, 0x2849bc0, 0xc000a0e7b0, 0xc000503380, 0x0, 0x2849bc0, 0xc000a0e7b0, 0xc000a22a00, 0x24c7)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/github.com/hashicorp/terraform-plugin-sdk@v1.0.0/internal/tfplugin5/tfplugin5.pb.go:3153 +0x217
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: google.golang.org/grpc.(*Server).processUnaryRPC(0xc0000be160, 0x287c1c0, 0xc00038e480, 0xc0005c5600, 0xc0006243c0, 0x367dd90, 0x0, 0x0, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:995 +0x460
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: google.golang.org/grpc.(*Server).handleStream(0xc0000be160, 0x287c1c0, 0xc00038e480, 0xc0005c5600, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:1275 +0xd97
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc000476470, 0xc0000be160, 0x287c1c0, 0xc00038e480, 0xc0005c5600)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:710 +0xbb
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: created by google.golang.org/grpc.(*Server).serveStreams.func1
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:708 +0xa1
2019-10-29T10:57:34.155+0800 [DEBUG] plugin: plugin process exited: path=/home/ubuntu/span-gcp-internal/environments/staging/workloads/staging/workload_gke/.terragrunt-cache/abrU2smPxLnsOYT7awftOrehozI/6rEP-yvQYmtwsPrcDCguWsdm3kU/modules/workload_gke/.terraform/plugins/linux_amd64/terraform-provider-google-beta_v2.18.1_x4 pid=118582 error="exit status 2"
2019/10/29 10:57:34 [ERROR] module.workload_gke.module.gke: eval: *terraform.EvalRefresh, err: rpc error: code = Unavailable desc = transport is closing
2019/10/29 10:57:34 [ERROR] module.workload_gke.module.gke: eval: *terraform.EvalSequence, err: rpc error: code = Unavailable desc = transport is closing
2019/10/29 10:57:34 [TRACE] [walkRefresh] Exiting eval tree: module.workload_gke.module.gke.google_container_cluster.primary
2019/10/29 10:57:34 [TRACE] vertex "module.workload_gke.module.gke.google_container_cluster.primary": visit complete
2019/10/29 10:57:34 [TRACE] vertex "module.workload_gke.module.gke.google_container_cluster.primary": dynamic subgraph encountered errors
2019/10/29 10:57:34 [TRACE] vertex "module.workload_gke.module.gke.google_container_cluster.primary": visit complete
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "module.workload_gke.module.gke.output.endpoint" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "module.workload_gke.output.endpoint" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "output.endpoint" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "provider.google-beta (close)" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "root" errored, so skipping
2019-10-29T10:57:34.198+0800 [DEBUG] plugin: plugin exited
2019-10-29T10:57:34.201+0800 [DEBUG] plugin: plugin process exited: path=/usr/local/bin/terraform pid=118518
2019-10-29T10:57:34.201+0800 [DEBUG] plugin: plugin exited

[terragrunt] 2019/10/29 10:57:34 Hit multiple errors:
exit status 1

Expected Behavior

The plan should succeed

Actual Behavior

The terraform google-beta provider crashed

Steps to Reproduce

  1. terraform plan

Important Factoids

Terraform is ran with Terragrunt. I have multiple GKE clusters setup using the same terraform module, however only one particular cluster (this) has the crashing issue.

Issue persists without using Terragrunt (using terraform directly).

References

  • #0000
@ghost ghost added bug crash labels Oct 25, 2019
@bluemalkin
Copy link

bluemalkin commented Oct 28, 2019

I also get the error message Error: rpc error: code = Unavailable desc = transport is closing using v2.18. I cannot tell which resource is causing it though.

@tysen
Copy link

tysen commented Oct 28, 2019

Have you cut an issue with the terraform core repo and submitted the crash.log file?

@chrissng
Copy link
Contributor Author

chrissng commented Oct 29, 2019

@tysen I believe this is not necessary as the TF devs would redirect us back to the provider.

I've bumped the google-beta provider to the latest version but the issue still persists. The stack trace suggests that the crash happened when there's a null object on this line https://github.com/terraform-providers/terraform-provider-google-beta/blob/release-2.18.1/google-beta/resource_container_cluster.go#L1198

@chrissng
Copy link
Contributor Author

chrissng commented Oct 29, 2019

Taken a look at the REST API responses.

On a working cluster, the response contains an empty shieldedNodes object: https://gist.githubusercontent.com/chrissng/c01959daad9050de7339ab44bca03663/raw/63275eeabc54cf53d56517915c7fe14a078ff0e6/working_cluster_response.json Terraform operations works.

On the cluster that has the crashing issue (this), the response does not contain a shieldedNodes object:
https://gist.github.com/chrissng/c01959daad9050de7339ab44bca03663/raw/63275eeabc54cf53d56517915c7fe14a078ff0e6/crash_cluster_response.json

Neither of these clusters have shielded nodes configured, so it is unknown why would the REST API return different results.

@chrissng
Copy link
Contributor Author

chrissng commented Oct 29, 2019

This PR should fix the issue: GoogleCloudPlatform/magic-modules#2555

@pdecat
Copy link
Contributor

pdecat commented Oct 29, 2019

Same issue here, my work-around was to force update the cluster with gcloud to get the empty shieldedNodes object in the API response:

# gcloud beta container clusters update my-cluster --zone my-zone --no-enable-shielded-nodes

@joemiller
Copy link
Contributor

Pinning the google-beta provider to 2.17.0 also seems to be a temp fix, fwiw

@bluemalkin
Copy link

Pinning the google-beta provider to 2.17.0 also seems to be a temp fix, fwiw

That's what I have also done.

@tysen
Copy link

tysen commented Oct 30, 2019

GoogleCloudPlatform/magic-modules#2555 is merged. Fix should be in next release.

@ghost
Copy link

ghost commented Nov 30, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 30, 2019
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug crash forward/review In review; remove label to forward service/container
Projects
None yet
Development

No branches or pull requests

5 participants