Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to init aws cluster with the message "could not init cloud provider "aws": error finding instance ... timeout #2359

Closed
arrcher opened this issue Dec 3, 2020 · 2 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@arrcher
Copy link

arrcher commented Dec 3, 2020

What keywords did you search in kubeadm issues before filing this one?

could not init cloud provider

kubeadm init --cloud-provider aws

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version: v1.19.4

Environment:

  • Kubernetes version: v1.19.4
  • Cloud provider or hardware configuration: aws
  • OS: Red Hat Enterprise Linux, 8.3 (Ootpa)
  • Kernel: Linux ip-10-83-62-10.ec2.internal 4.18.0-240.1.1.el8_3.x86_64 kubeadm join on slave node fails preflight checks #1 SMP Fri Oct 16 13:36:46 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Others:
    config file:
cat ./kubeadm.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
apiServer:
  extraArgs:
    cloud-provider: aws
clusterName: cdspidr
controlPlaneEndpoint: ip-10-83-62-10.ec2.internal
controllerManager:
  extraArgs:
    cloud-provider: aws
    configure-cloud-routes: "false"
kubernetesVersion: stable
networking:
  dnsDomain: cluster.local
  podSubnet: 10.83.62.0/24
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
  kubeletExtraArgs:
    cloud-provider: aws

What happened?

sudo kubeadm init --config=kubeadm.yaml -v=5 > ./kubeadm-run.txt 2>&1

kubeadm will never fully initialize. the output shows

[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

and journalctl -xeu kubelet shows

12025 aws.go:1235] Building AWS cloudprovider
Dec 03 14:34:29 ip-10-83-62-10.ec2.internal kubelet[12025]: I1203 14:34:29.111272   12025 aws.go:1195] Zone not specified in configuration file; querying AWS metadata service
Dec 03 14:36:29 ip-10-83-62-10.ec2.internal kubelet[12025]: F1203 14:36:29.464611   12025 server.go:265] failed to run Kubelet: could not init cloud provider "aws": error finding instance : \"RequestError: send request failed\\ncaused by: Post \\\"https://ec2.us-east-1.amazonaws.com/\\\": dial tcp 10.83.60.25:443: i/o timeout\""
Dec 03 14:36:29 ip-10-83-62-10.ec2.internal kubelet[12025]: goroutine 1 [running]:

configs in /etc/manifests/ contains env settings with proxy variables.

While from terminal with same proxy settings the
curl -v https://ec2.us-east-1.amazonaws.com/ does not timeout and returns data.

What you expected to happen?

I would expect kubeadm init --cloud-provider aws to successfully complete.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

It's entirely private VPC

@arrcher arrcher changed the title Fail to init aws cluster with the message "could not init cloud provider "aws": error finding instance : \"RequestError: send request failed\\ncaused by: Post \\\"https://ec2.us-east-1.amazonaws.com/\\\": dial tcp 10.83.60.25:443: i/o timeout\""" Fail to init aws cluster with the message "could not init cloud provider "aws": error finding instance ... timeout Dec 3, 2020
@neolit123
Copy link
Member

neolit123 commented Dec 3, 2020

hello,

kubeadm seems to be doing its job to pass the flags to the components that need them, so this doesn't look like a kubeadm issue. also note that the kubeadm team does not have e2e signal for any of the legacy (non-external) cloud providers in kubernetes and we do not know if they are working as expected.

this guide on the web seem to suggest that what you are doing is valid for AWS:
https://blog.scottlowe.org/2019/08/14/setting-up-aws-integrated-kubernetes-115-cluster-kubeadm/

Dec 03 14:36:29 ip-10-83-62-10.ec2.internal kubelet[12025]: F1203 14:36:29.464611 12025 server.go:265] failed to run Kubelet: could not init cloud provider "aws": error finding instance : "RequestError: send request failed\ncaused by: Post \"https://ec2.us-east-1.amazonaws.com/\\\": dial tcp 10.83.60.25:443: i/o timeout""

this looks like a connectivity issue on the side of the kubelet.

before logging a ticket in kubernetes/kubernetes about this and tagging it with /sig node cloud-provider i'd encourage you to try to get feedback by other users on our support channels like slack, stackoverflow, discuss:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

thanks
/kind support
/close

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Dec 3, 2020
@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

hello,

kubeadm seems to be doing its job to pass the flags to the components that need them, so this doesn't look like a kubeadm issue. also note that the kubeadm team does not have e2e signal for any of the legacy (non-external) cloud providers in kubernetes and we do not know if they are working as expected.

this guide on the web seem to suggest that what you are doing is valid for AWS:
https://blog.scottlowe.org/2019/08/14/setting-up-aws-integrated-kubernetes-115-cluster-kubeadm/

Dec 03 14:36:29 ip-10-83-62-10.ec2.internal kubelet[12025]: F1203 14:36:29.464611 12025 server.go:265] failed to run Kubelet: could not init cloud provider "aws": error finding instance : "RequestError: send request failed\ncaused by: Post \"https://ec2.us-east-1.amazonaws.com/\\\": dial tcp 10.83.60.25:443: i/o timeout""

this like a connectivity issue on the side of the kubelet.

before logging a ticket in kubernetes/kubernetes about this and tagging it with /sig node cloud-provider i'd encourage you to try to get feedback by other users on our support channels like slack, stackoverflow, discuss:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

thanks
/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants