Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an intermittent failed to refresh Bearer token error when trying to delete my AKS cluster #2602

Closed
btai24 opened this issue Jan 4, 2019 · 20 comments · Fixed by #4775

Comments

@btai24
Copy link

btai24 commented Jan 4, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.11.10
AzureRM Provider v1.20.0

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "aks_cluster" {
  name       = "${var.name}"
  location   = "${var.region}"
  dns_prefix = "${var.name}"

  kubernetes_version  = "${var.kubernetes_version}"
  resource_group_name = "${azurerm_resource_group.aks_resource_group.name}"

  linux_profile {
    admin_username = "xxx"

    ssh_key {
      key_data = "${var.ssh_public_key}"
    }
  }

  agent_pool_profile {
    count = "${var.node_count}"

    name            = "agentpool"
    vm_size         = "${var.vm_size}"
    os_disk_size_gb = "${var.os_disk_size}"
    os_type         = "Linux"
    vnet_subnet_id  = "${azurerm_subnet.private.id}"
    max_pods        = 110
  }

  service_principal {
    client_id     = "${azurerm_azuread_service_principal.service_principal.application_id}"
    client_secret = "${random_string.service_principal_password.result}"
  }

  role_based_access_control {
    enabled = true

    azure_active_directory {
      client_app_id     = "${var.rbac_client_app_id}"
      server_app_id     = "${var.rbac_server_app_id}"
      server_app_secret = "${var.rbac_server_app_secret}"
    }
  }

  network_profile {
    network_plugin = "azure"
  }

  depends_on = [
    "azurerm_azuread_service_principal.service_principal",
    "azurerm_azuread_service_principal_password.password",
  ]

  tags {
    environment = "${var.environment}"
    name        = "${var.name}"
  }
}

Debug Output

Unfortunately this happens intermittently, so I haven't been able to get debug output It started happening after I upgraded to AzureRM Provider v1.20, but I'm not sure if there is a connection.

Expected Behavior

Running terraform destroy should successfully delete the terraform provisioned AKS cluster on the first attempt.

Actual Behavior

Running terraform destroy does not always successfully delete the terraform provisioned AKS cluster on the first attempt. It always succeeds on a second attempt.

The error produced:

Error: Error applying plan:

1 error(s) occurred:

* module.aks_cluster.azurerm_kubernetes_cluster.aks_cluster (destroy): 1 error(s) occurred:

* azurerm_kubernetes_cluster.aks_cluster: Error waiting for the deletion of Managed Kubernetes Cluster "test-westus2" (Resource Group "aks-rg-test-westus2"): azure.BearerAuthorizer#WithAuthorization: 
Failed to refresh the Token for request to https://management.azure.com/subscriptions/<subscription_id>providers/Microsoft.ContainerService/locations/westus2/operations/<id>?api-version=2016-03-30: StatusCode=0 -- 
Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token

Steps to Reproduce

This unfortunately happens intermittently. But running a terraform destroy on an AKS cluster sometimes results in the error above.

@torumakabe

This comment has been minimized.

@torumakabe
Copy link

The same error happened besides AKS cluster creation/deletion. It seems that the error occurs in long-running plan/apply. The following is an example of it during Resource Group deletion at the end of long-running apply.

Error: Error applying plan:

1 error(s) occurred:

* azurerm_resource_group.shared (destroy): 1 error(s) occurred:

* azurerm_resource_group.shared: Error deleting Resource Group "myrg": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to
https://management.azure.com/subscriptions/myid/operationresults/myresult?api-version=2018-05-01: StatusCode=0 -- Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token

@katbyte Do you have any advice?

@tombuildsstuff
Copy link
Contributor

hi @btai24 @torumakabe

Thanks for opening this issue :)

This appears to be a bug in the authentication logic which we use to connect to Azure (specifically how it handles refreshing tokens) - as such this would require a bug-fix there (which is handled in this repository: http://github.com/hashicorp/go-azure-helpers). So that we're able to diagnose this further - would it be possible to know which method you're using to authenticate with Azure from Terraform (e.g. the Azure CLI / a Service Principal with a Client Secret etc)?

Thanks!

@torumakabe
Copy link

@tombuildsstuff Thanks! I use Azure CLI.

@tombuildsstuff
Copy link
Contributor

@torumakabe thanks for confirming that. Since this appears to be an issue in the upstream library I've created an upstream issue for this: hashicorp/go-azure-helpers#22

@OffColour

This comment has been minimized.

@btai24

This comment has been minimized.

@PriceChild

This comment has been minimized.

@markokole

This comment has been minimized.

@markokole

This comment has been minimized.

@mariojacobo

This comment has been minimized.

@mariojacobo

This comment has been minimized.

@mikhailshilkov

This comment has been minimized.

@theharleyquin

This comment has been minimized.

@amasover
Copy link
Contributor

It looks like Azure/go-autorest#476 was just recently merged in, so once it gets incorporated downstream this issue should be fixed.

@tombuildsstuff
Copy link
Contributor

@amasover yeah, we've a PR ready to go into the base library to fix this, it's just waiting on a release of go-autorest which looks like it's happening soon-ish :)

@ghost
Copy link

ghost commented Nov 26, 2019

This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.37.0"
}
# ... other configuration ...

@dl888888
Copy link

We upgraded to 1.44.0 version of azurerm and now I'm seeing this problem first time. Anyone else experiencing this?

this is what I have now:
Terraform v0.12.21

provider.azuread v0.7.0
provider.azurerm v1.44.0
provider.random v2.2.1

@alexyakunin
Copy link

Same for me.

@ghost
Copy link

ghost commented Feb 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Feb 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.