Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call to http://localhost/version with configured host and credentials #708

Closed
dploeger opened this issue Dec 13, 2019 · 62 comments
Closed

Comments

@dploeger
Copy link

Terraform Version

Terraform v0.12.17

  • provider.azurerm v1.38.0
  • provider.kubernetes v1.10.0

Affected Resource(s)

Please list the resources as a list, for example:

  • kubernetes_persistent_volume

Terraform Configuration Files

provider "kubernetes" {
  version                = "~> 1.10.0"
  host                   = module.azurekubernetes.host
  username               = module.azurekubernetes.username
  password               = module.azurekubernetes.password
  client_certificate     = base64decode(module.azurekubernetes.client_certificate)
  client_key             = base64decode(module.azurekubernetes.client_key)
  cluster_ca_certificate = base64decode(module.azurekubernetes.cluster_ca_certificate)
}

resource "kubernetes_persistent_volume" "factfinder-pv" {
  metadata {
    name = "ff-nfs-client"
    labels = {
      type          = "factfinder"
      sub_type      = "nfs"
      instance_type = "pv"
    }
  }
  spec {
    access_modes = ["ReadWriteMany"]
    capacity = map("storage", "${var.shared_storage_size}Gi")

    persistent_volume_source {
      nfs {
        path   = "/"
        server = var.nfs_service_ip
      }
    }
    storage_class_name = "nfs"
  }
}

Debug Output

(The debug output is huge and I just pasted a relevant section of it. If you need more, I'll create a gist)

2019/12/13 09:45:42 [DEBUG] ReferenceTransformer: "module.factfinder.kubernetes_service.factfinder-fffui-service" references: []
2019/12/13 09:45:42 [DEBUG] ReferenceTransformer: "module.loadbalancer.kubernetes_config_map.tcp-services" references: []
2019/12/13 09:45:42 [DEBUG] ReferenceTransformer: "module.factfinder.kubernetes_deployment.factfinder-sftp" references: []
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: ---[ REQUEST ]---------------------------------------
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: GET /version?timeout=32s HTTP/1.1
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Host: localhost
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: User-Agent: HashiCorp/1.0 Terraform/0.12.17
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept: application/json, */*
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept-Encoding: gzip
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2019-12-13T09:45:42.986Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2019-12-13T09:45:42.986Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: -----------------------------------------------------

Expected Behavior

When running terraform in the hashicorp/terraform container, a terraform plan should run properly

Actual Behavior

The plan errors out with the following error:

Error: Get http://localhost/version?timeout=32s: dial tcp 127.0.0.1:80: connect: connection refused

  on ../modules/factfinder/factfinder-nfs-client-pv.tf line 6, in resource "kubernetes_persistent_volume" "factfinder-pv":
   6: resource "kubernetes_persistent_volume" "factfinder-pv" {

This only happens, when running terraform in the container. When ran locally, everything is fine. (Even when the local .kube directory is removed)

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform plan or terraform apply

Important Factoids

  • Running in Azure AKS
  • Running in a Docker container based on the hashicorp/terraform image
@dploeger
Copy link
Author

Hm. Now I even somehow configured my local environment, that that happens. 🤷‍♂

@mtekel
Copy link

mtekel commented Dec 18, 2019

Happens with me as well. I have changed kubernetes secret metadata name from a string to interpolated value... which resolves into same string. The original has no issue, the interpolated connects to locahost...

resource "kubernetes_secret" "vault-gcp" {
  metadata {
    name = "${var.deployment_name}-gcp"
  }
...
}

When name is "vault-gcp", it's fine. In new branch with above code and deployment name set to "vault", hence resulting interpolation being "vault-gcp", this fails with connection to locahost.

Seems like TF/provider thinks this is some new/different instance of the resource, which somehow does not belong to the configured kub cluster, so it probably fails back to default "localhost" address.

@dploeger
Copy link
Author

I have no interpolated values in metadata, but in the spec. But I have that in all kubernetes resources and only the resource mentioned above has the problem (or it is the first one it comes across and then stops, could be as well).

Quite the phenomenom really. Any core developer around? 😉

@mtekel
Copy link

mtekel commented Dec 18, 2019

Tried a workaround with conditional:

name = var.legacy == "yes" ? "vault-gcp" : "${var.deployment_name}-gcp"

This way I wanted to make it have non-interpolated string directly in some cases, but this ended up with the same issue. My TF version is 0.12.18. I have kub provider configured with host and config:

provider "kubernetes" {
  host  = google_container_cluster.vault.endpoint
  token = data.google_client_config.current.access_token
  cluster_ca_certificate = base64decode(
    ....
  )
  load_config_file = false
}

Then I have tried another workaround, with defining 2 resources, one with interpolation, one with string and then controlling which resource actually gets deployed with

count =  var.legacy == "yes" ? 1 : 0

But this ended up with new resources[0] even for legacy deployment (where it is already deployed and I am trying to achieve 0 changes on TF apply).

So I would say the issue is somehow not respecting existing kubernetes provider config for new resources...

kubernetes_secret.vault-gcp[0]: Refreshing state... [id=default/vault-gcp]
...

Error: Get http://localhost/api/v1/namespaces/default/secrets/vault-gcp: dial tcp [::1]:80: connect: connection refused

@dploeger
Copy link
Author

I think, it's interesting, that it even tries to call via HTTP and not HTTPS, which would be the default I think.

@mtekel
Copy link

mtekel commented Dec 19, 2019

So it turns out that in my case, I was also pointing to a wrong location in the bucket where it had no tfstate. As most of the resources in GCP have same ID as name, even without state, terraform was able to find and refresh my whole stack, except the kub secrets, where it was connecting to localhost, as it had no state about where the cluster was...

In EC2, that would blow up probably sooner, as resource IDs are quite different from resource names and if you lose state you have a lot of trouble finding where everything is...

@dploeger
Copy link
Author

Okay, I found the problem for my case. This line here:

https://github.com/terraform-providers/terraform-provider-kubernetes/blob/45d910a26f17f7b03d684221428b86f2f02b5be2/kubernetes/resource_kubernetes_persistent_volume.go#L40

If you remove all the CustomizeDiff part, all works fine. So I guess, the correct server isn't carried through to that point. I try to dig deeper there.

@dploeger
Copy link
Author

@alexsomesan @pdecat You added that line there refactoring the whole client handling. Could you think of any implications that could cause this behaviour? It seems as if the MainClientset isn't correctly configured when it reaches the CustomizeDiff function.

@pdecat
Copy link
Contributor

pdecat commented Dec 19, 2019

Hi @dploeger, I believe the initialization here occurs too early. The CustomizeDiff probably needs to be replaced by a CustomizeDiffFunc.

@dploeger
Copy link
Author

@pdecat You probably know how to do this. I just stumbled through the code. 😆 Are you able to provide a PR for that?

@dploeger
Copy link
Author

Or can you point me on how to implement that? Just replacing CustomizeDIff with CustomizeDiffFunc didn't work at least. :)

@pdecat
Copy link
Contributor

pdecat commented Dec 19, 2019

Never mind, it won't work, CustomizeDiffFunc is the type of the CustomizeDiff field.

Let me think of something else.

@alexsomesan
Copy link
Member

@dploeger Are you building the AKS resources from module.azurekubernetes in the same apply run as the kubernetes_persistent_volume ?

@dploeger
Copy link
Author

Yes, I am. And that all worked until 12-9. I can’t really grasp what has changed then, because we didn’t update or change anything there.

@pdecat
Copy link
Contributor

pdecat commented Dec 19, 2019

@dploeger Are you building the AKS resources from module.azurekubernetes in the same apply run as the kubernetes_persistent_volume ?

Good point, that's the most frequent issue when localhost is involved. The configuration is not available at the time the kubernetes provider is initialized.
The point about removing CustomizeDiff fixing the issue made me think of something else, but it turns out the kubernetes client is only initialized once by the provider.

@alexsomesan
Copy link
Member

Further question: is this happening when running TF in a Pod on the cluster?

@dploeger
Copy link
Author

Ummmm... I haven't tried that. Is that important? I'd have to set that up. I just tried locally. It also happens outside the container now.

@jakexks
Copy link

jakexks commented Jan 30, 2020

I'm experiencing this with a module that nests other modules, sometimes the child modules lose provider configuration and the terraform config becomes un-applyable, but also un-destroyable!

The parent creates a DigitalOcean Kubernetes cluster inside a module, then uses the output of the module to get a data source which configures the provider e.g.

module "e2etest_k8s" {
  source = "./infrastructure/kubernetes/do"
  providers = {
    digitalocean = digitalocean.e2etest
  }
}

data "digitalocean_kubernetes_cluster" "e2etest" {
  provider = digitalocean.e2etest
  name     = module.e2etest_k8s.cluster_name
}

provider "kubernetes" {
  alias            = "e2etest"
  load_config_file = false
  host             = data.digitalocean_kubernetes_cluster.e2etest.endpoint
  token            = data.digitalocean_kubernetes_cluster.e2etest.kube_config[0].token
  cluster_ca_certificate = base64decode(
    data.digitalocean_kubernetes_cluster.e2etest.kube_config[0].cluster_ca_certificate
  )
}

// This also contains submodules
module "<rest of infra>" {
  source = "./<folders>"
  providers = {
    kubernetes = kubernetes.e2etest
  }
}

This provider is then used for a bunch of modules (which also contain modules) that then exhibit the localhost behavior (sometimes, but it seems deterministic between runs).

@nothingofuse
Copy link

nothingofuse commented Feb 11, 2020

any updates on this? Im trying to upgrade from 7.0.1 to 8.2.0 of the EKS terraform module (https://github.com/terraform-aws-modules/terraform-aws-eks) -- I'm able to get through the initial import of the aws-auth configmap by using a local kubeconfig the first time (overriding load_config_file to true for the import), but subsequent plans always fail with a call to localhost.

my provider config looks like

provider "kubernetes" {
  load_config_file       = var.load_config 
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  version                = "1.10.0" # Stable version??
}

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
Error: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp [::1]:80: connect: connection refused
module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: 2020/02/11 10:16:22 [INFO] Checking config map aws-auth
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: 2020/02/11 10:16:22 [DEBUG] Kubernetes API Request Details:
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: ---[ REQUEST ]---------------------------------------
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: GET /api/v1/namespaces/kube-system/configmaps/aws-auth HTTP/1.1
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Host: localhost
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: User-Agent: HashiCorp/1.0 Terraform/0.12.20
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept: application/json, */*
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept-Encoding: gzip
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: -----------------------------------------------------
2020-02-11T10:16:22.089-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: 2020/02/11 10:16:22 [DEBUG] Received error: &url.Error{Op:"Get", URL:"http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth", Err:(*net.OpError)(0xc000976050)}
2020/02/11 10:16:22 [ERROR] module.eks: eval: *terraform.EvalRefresh, err: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp [::1]:80: connect: connection refused
2020/02/11 10:16:22 [ERROR] module.eks: eval: *terraform.EvalSequence, err: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp [::1]:80: connect: connection refused

I'm happy to provide further information/logs/tests to get this issue resolved ASAP. I have tried provider versions 1.8.1, 1.9.0, 1.10.0 and 1.11.0 (1.11.0 gives me a different error corresponding to issue 759). I'm using terraform 0.12.20

@hazcod
Copy link
Contributor

hazcod commented Feb 12, 2020

Having the same issue where I use the scaleway kapsule provider kubeconfig output as input for my kubernetes terraform provider. Using local kubeconfig does not resolve the issue during terraform plan. https://github.com/ironPeakServices/infrastructure/runs/435886375?check_suite_focus=true

@brpaz
Copy link

brpaz commented Feb 16, 2020

I have exact problem of @jakexks and @hazcod. Everything was working when I had everything in the root module but when I split to a separate module, it starts giving errors saying "rror: invalid configuration: no configuration has been provided" as well as trying to connect using localhost.

@hazcod
Copy link
Contributor

hazcod commented Feb 17, 2020

@brpaz : so it works if you run it from the root module?
Might be an overall terraform issue, since I had the issue that some terraform variables were not being set for submodules, making me have to set it in the root module too e.g.: https://github.com/ironPeakServices/infrastructure/blob/master/versions.tf#L20

@brpaz
Copy link

brpaz commented Feb 17, 2020

@hazcod yes, I had all my Terraform resources into main.tf in the root module. Everything was working.
Because the configs were growing I created a module and split my main.tf into several files inside the module. After that change and run terraform apply, it started giving these errors.

But then I tried a fresh install (clean state and a new cluster provision from scratch and it worked.
I think somehow a conflict between what was persisted in the state file and the new terraform declarations, resulted in terraform to pick a wrong config?

@hazcod
Copy link
Contributor

hazcod commented Feb 18, 2020

This might be related to hashicorp/terraform#24131

@hazcod
Copy link
Contributor

hazcod commented Feb 21, 2020

After reaching out to terraform core, above issue seems to indicate that it's a kubernetes provider issue where it's not handling the unknown variables well.

@hazcod
Copy link
Contributor

hazcod commented Feb 25, 2020

I have drilled this down to the following: if a kubernetes provider is receiving unknown values (because of a dependency), it should go through with the plan because it would normally be fulfilled in the apply phase. I think that's a better approach than just erroring out now.

@hazcod
Copy link
Contributor

hazcod commented Feb 26, 2020

This is really frustrated, if my scaleway provider cluster is removed, I have to take following manual steps:

  • Comment out kubernetes/helm provider code
  • Trigger the pipeline deploy
  • Re-enable kubernetes/helm provider code
  • Hope everything goes well or start over again

@hazcod
Copy link
Contributor

hazcod commented Feb 27, 2020

I circumvented this with:

provider "kubernetes" {
    # fixed to 1.10.0 because of https://github.com/terraform-providers/terraform-provider-kubernetes/issues/759
    version = "1.10.0" 
    # set the variable in the root module or else we have a dependency issue
    token = module.scaleway.token
}

@liangyungong
Copy link

I'm confused, your terraform providers output has eks and this grep output has aks.

they're irrelevant files, just how the modules are organised in git repo. :)

@pdecat
Copy link
Contributor

pdecat commented Mar 12, 2020

@liangyungong I still do not get how you can have AWS resources in the terraform providers output and Azure resource in the grep output. They do not correspond to each other.

Your terraform providers explicitly states that there's a kubernetes provider initialized in the AWS eks module that is not inherited from the root module:

.
├── provider.aws ~> 2.44.0
├── provider.kubernetes ~> 1.10
├── provider.terraform
└── module.cluster
    ├── provider.aws (inherited)
    ├── module.alb_ingress_controller_iam_policy
    │   └── provider.aws (inherited)
    ├── module.eks
    │   ├── provider.aws >= 2.38.0
    │   ├── provider.kubernetes >= 1.6.2 # <-- HERE
[...]

That means there a provider kubernetes block in there.

Can you check the content of that module?

@liangyungong
Copy link

@liangyungong I still do not get how you can have AWS resources in the terraform providers output and Azure resource in the grep output. They do not correspond to each other.

Your terraform providers explicitly states that there's a kubernetes provider initialized in the AWS eks module that is not inherited from the root module:

.
├── provider.aws ~> 2.44.0
├── provider.kubernetes ~> 1.10
├── provider.terraform
└── module.cluster
    ├── provider.aws (inherited)
    ├── module.alb_ingress_controller_iam_policy
    │   └── provider.aws (inherited)
    ├── module.eks
    │   ├── provider.aws >= 2.38.0
    │   ├── provider.kubernetes >= 1.6.2 # <-- HERE
[...]

That means there a provider kubernetes block in there.

Can you check the content of that module?

There're many other modules in the same git repo, and they are irrelevant to the module that I use. Whenever I do terraform init, it clones the whole git repo.

@pdecat
Copy link
Contributor

pdecat commented Mar 13, 2020

There're many other modules in the same git repo, and they are irrelevant to the module that I use. Whenever I do terraform init, it clones the whole git repo.

So the module.eks provider block does not have load_config_file = false.

@dsymonds
Copy link
Contributor

I'm hitting this problem, but not with any modules.

$ terraform providers
.
├── provider.google ~> 3.13
├── provider.google-beta ~> 3.13
├── provider.kubernetes.xxx ~> 1.11.1
└── provider.kubernetes.yyy ~> 1.11.1

(two separate kubernetes providers with aliases)

Is there a known workaround that doesn't involve winding back the kubernetes provider to 1.10? I need to be using 1.11 for other reasons.

@dsymonds
Copy link
Contributor

Actually my setup has started working again after forcibly re-fetching credentials, though it was very confusing why it was trying to contact localhost when the creds were bad.

@plwhite
Copy link

plwhite commented Mar 19, 2020

Not sure if this is the same problem, but just in case, I hit the following.

I had a kubernetes provider blob looking a bit like this.

provider "kubernetes" {
  version                = "1.11"
  host                   = var.credentials.host
  username               = var.credentials.username
  password               = var.credentials.password
  client_certificate     = var.credentials.client_certificate
  client_key             = var.credentials.client_key
  cluster_ca_certificate = var.credentials.cluster_ca_certificate
}

This failed in both 1.10 and 1.11. With 1.10, I got an error report explaining that I must set username and password or bearer token not both (fair enough). With 1.11, no error and it ignored host, contacting localhost.

If I removed username and password from then it all worked (in both versions). That makes me think that a failure in validation in 1.11 might lead to it dropping through with the host still set to localhost.

@alexsomesan
Copy link
Member

@plwhite The error you got in 1.10 was not right, but not exhaustive since client certificates are also an equivalent form of authentication. Better validation was introduced in 1.11 that why you are not seeing that error anymore. The rule is to have one of either: token, user/pass OR client certificates. Having two of these like in you example is not deterministic (which one should be used to authenticate you?) and it looks like that's not being validated - we'll work on fixing that.

However, the reason you're seeing the connection to localhost is likely because Terraform is unable to resolve the value for var.credentials.host at the right time. How is var.credentials being populated in your case?

@plwhite
Copy link

plwhite commented Apr 1, 2020

@alexsomesan I was populating var.credentials through variables set up by the azurerm provider creating an AKS cluster, which from memory did have host configured. I'm moderately sure that was set consistently but it's possible there was a transient error where it failed at about the same time as I hit this. Since moving to the more recent kubernetes provider I've seen no further issues, so quite happy to consider this fixed.

@alexsomesan
Copy link
Member

The key aspect here is whether you are creating the azurerm cluster resource in the same apply run as the kubernetes resources?

@plwhite
Copy link

plwhite commented Apr 2, 2020

In the same apply run. Sometimes the azure cluster already existed, and sometimes not (and was created by the apply run).

@muhlba91
Copy link

I experience similar issues with this setup:

  • terraform bootstraps virtual machines
  • terraform RKE provider set ups Kubernetes/RKE
  • I use the retrieved attributes from the RKE provider to setup this kubernetes provider

Interestingly, all works well if I run terraform apply locally/on any machine but once I run it in our CI (GitLab CI and/or Jenkins), I run into the same issue that this provider does not pick up the RKE configuration but instead dials localhost port 80.
For CI we use cytopia:terragrunt (clean run without any caches).

@muhlba91
Copy link

fyi, my problem was also related to #708 (comment)

My interesting observation was though:

  • on local machines we had already a kubeconfig present for different clusters than what we tried to create and all was perfectly fine without setting the load_config_file in the kubernetes and helm provider
  • on CI obviously no kubeconfig at all was already present and suddenly the provider tried to connect on localhost port 80; after setting load_config_file to false in both the kubernetes and helm provider, it was working

Could someone explain me why in one case it's necessary to set load_config_file = false and in the other case with an already existing kubeconfig file it isn't? Furthermore, it seems as if the kubeconfig values would get overwritten anyways.

@xp-vit
Copy link

xp-vit commented May 7, 2020

Has same issue with version "1.11.2". Solved following way:

  1. Downgraded to 1.10.0 and received error:
    Error: Failed to configure: username/password or bearer token may be set, but not both

  2. Removed "username/password" and left only "client_certificate/client_key/cluster_ca_certificate"

  3. Problem solved and Now everything works with "1.11.2".

Enjoy.

@mbelang
Copy link

mbelang commented Jul 23, 2020

I have the issue with 1.11.4 on EKS. It's like the provider is initialized with default settings even if in my module I'm using the credentials from the EKS cluster. I found no workaround to the issue. This is really frustrating.

I validate that I do not have any other kubernetes provider set that could override. I still unsure but that could be related to the fact that I'm using terragrunt 🤷

@mbelang
Copy link

mbelang commented Jul 23, 2020

I just tried reverting to 1.10.0 version of the provider. It worked. I managed to create the resources but next plan failed with:

Error: namespaces "my_namespace" is forbidden: User "system:anonymous" cannot get resource "namespaces" in API group "" in the namespace "my_namespace"

I guess it is related to EKS rbac but how is it possible to not use anonymous user without a kube config?

@mbelang
Copy link

mbelang commented Jul 23, 2020

I managed to make it work with

provider "kubernetes" {
  version                = "~> 1.11.0"
  load_config_file       = false
  host                   = aws_eks_cluster.eks.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.eks.certificate_authority.0.data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["token", "-i", aws_eks_cluster.eks.name, "-r", "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/MyRole"]
    command     = "aws-iam-authenticator"
  }
}

I think I understand what is happening here.

  1. first plan no resources created so the token is not required
  2. first apply, token is generated by
data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
  1. second plan, it doesn't work because the plan will not generate a token from data

Getting the token on each provider call as above solution works just fine.

@etwillbefine
Copy link

I have the same issue when using Kubernetes Provider > 1.10 (maybe related to #759). Using Provider Version 1.10.0 works as expected. 1.11 and 1.12 do not work with the following config running inside a Kubernetes Cluster:

KUBE_LOAD_CONFIG_FILE=false
KUBERNETES_SERVICE_HOST=<k8s-host>
KUBERNETES_SERVICE_PORT=443

Steps to reproduce:

  1. Create a Pod with the Environment Variables mentioned above (https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#in-cluster-service-account-token)
  2. Use a minimal Terraform File with a Kubernetes Secret Resource (see below)
  3. Run terraform apply

Results in Error: Post "http://localhost/api/v1/namespaces/default/secrets": dial tcp 127.0.0.1:80: connect: connection refused

provider "kubernetes" {
  version = "~> 1.11"
}

resource "kubernetes_secret" "test" {
  metadata {
    name = "test"
    namespace = "default"
  }

  data = {
    test = "data"
  }
}

I tried to configure the Kubernetes Provider using load_config_file and KUBE_LOAD_CONFIG_FILE. Enabling debug shows the following: [WARN] Invalid provider configuration was supplied. Provider operations likely to fail: invalid configuration: no configuration has been provided

@alexsomesan
Copy link
Member

@etwillbefine I wasn't able to reproduce the issue with the configuration you provided.
I ran a test inside a Debian container in a Pod on a 1.18 cluster and it worked as expected for me. See the output below.

root@test-708:/test-708# env | grep KUBERNETES | sort
KUBERNETES_PORT=tcp://10.3.0.1:443
KUBERNETES_PORT_443_TCP=tcp://10.3.0.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.3.0.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.3.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
root@test-708:/test-708# cat main.tf
provider "kubernetes" {
  version = "~> 1.11"
  load_config_file = "false"
}

resource "kubernetes_namespace" "test" {
  metadata {
    name = "test"
  }
}

root@test-708:/test-708# terraform init

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/kubernetes versions matching "~> 1.11"...
- Installing hashicorp/kubernetes v1.13.1...
- Installed hashicorp/kubernetes v1.13.1 (signed by HashiCorp)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
root@test-708:/test-708# terraform version
Terraform v0.13.2
+ provider registry.terraform.io/hashicorp/kubernetes v1.13.1
root@test-708:/test-708# terraform apply -auto-approve
kubernetes_namespace.test: Creating...

Error: namespaces is forbidden: User "system:serviceaccount:default:default" cannot create resource "namespaces" in API group "" at the cluster scope

  on main.tf line 6, in resource "kubernetes_namespace" "test":
   6: resource "kubernetes_namespace" "test" {

@alexsomesan
Copy link
Member

I'm going to close this issue as it's become a catch-all for credentials misconfigurations.
Please open separate issues if you're having trouble with configuring credentials so we can address them specifically.

@ghost
Copy link

ghost commented Oct 10, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Oct 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests