Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-router unable to connect API server on start because nodelocaldns can't access Service IP provided by kube-router #6175

Closed
rearden-steel opened this issue May 21, 2020 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@rearden-steel
Copy link

rearden-steel commented May 21, 2020

After deploying a new cluster we have experienced a strange problem — kube-router pod on some node stucks in CrashLoopBackOff.
The log file of kube-router says timeout connecting to API server:

E0521 09:25:04.217633    1733 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Node: Get https://localhost:6443/api/v1/nodes?resourceVersion=0: dial tcp: i/o timeout

Checking strace of the kube-router reveals that it tries to resolve localhost by querying nodelocaldns on it's IP address and gets a timeout.
In logs of nodelocaldns it tries to access DNS service IP's provided by kube-router.

The problem solves if 127.0.0.1 is specified instead of localhost in kube-router kubeconfig, in inventory like this:

kube_apiserver_endpoint: https://127.0.0.1:6443

Environment:

  • Cloud provider or hardware configuration:
    Bare metal k8s cluster

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.18.0-147.8.1.el8_1.x86_64 x86_64
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"
  • Version of Ansible (ansible --version):

  • Version of Python (python --version):

Kubespray version (commit) (git rev-parse --short HEAD):
2.13.0
01dbc90

Network plugin used:
kube-router

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Command used to invoke ansible:

Output of ansible run:

Anything else do we need to know:

@rearden-steel rearden-steel added the kind/bug Categorizes issue or PR as related to a bug. label May 21, 2020
@mikesmitty
Copy link

I ran into this issue as well with version 2.13.0 with Ubuntu 18.04 nodes

@qingkunl
Copy link
Contributor

I got the same issue as well

@qingkunl
Copy link
Contributor

This is because kube-router uses alpine as base image where /etc/nsswitch.conf is not included, as a result, the localhost cannot be resolved from /etc/hosts. I submitted cloudnativelabs/kube-router#957 to work around this issue.

@qingkunl
Copy link
Contributor

qingkunl commented Aug 6, 2020

This is because kube-router uses alpine as base image where /etc/nsswitch.conf is not included, as a result, the localhost cannot be resolved from /etc/hosts. I submitted cloudnativelabs/kube-router#957 to work around this issue.

My kube-router PR has been merged and released in v1.0.1, and #6479 has updated kube-router to v1.0.1 in Kubespray. So this issue should have been fixed.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 4, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants